Data Chains - Deeper Dive

tfa · December 8, 2017, 9:53pm

Only remarks about the form and not the substance:

I didn’t really receive an answer about why the traditional group_size value has the same value as the number of elders in charge of a section (see this post in the other forum):

Even if Maidsafe estimates now that these 2 values are the same, tests may reveal later that security must be emphasized more on replication factor or on contrary more on the number of elders. Having the possibility to change their value independently would allow testing different hypothesis. Also, even if the values are the same, having a proper name for entities helps for a more readable code.

For example, using 2 parameters like “replication_factor” and “council_of_elders_size” would be more maintainable and clearer than using unique parameter “group_size” (these terms are only propositions to illustrate what I mean).

Again, about terminology clarity: Block is too vague and doesn’t convey any meaning. I propose something like “BallotBox”, “ElectionResults” or “PollResults”? This is exactly this: a set of votes about one event.

In conclusion I would say that using a better terminology would help for a more understandable specification and later for a more maintainable code.

dirvine · December 8, 2017, 10:25pm

Yea, we had restart capability then pulled it at the last minute. It’s in our other doc which has a lot more in it. so this is a summary of the bigger more detailed doc. Right now the states are
Live Gone Dead Where Live is active, Gone means it is on the network but not in our section OR has been demoted to an Adult (i.e. not in our Elders group, Dead means that ID is no longer able to appear on the network (its banned). The confusion is the Dead used to be Killed or Relocated where a Lost peer could reconnect and be relocated with loss of 50% age. This will still happen, but removed from this summary. I guess the wording was not properly edited there. sorry for the confusion there chap.

Sorry you did not get an answer, the Elders are the GroupSize oldest peers in a section, only if that contains no infants is it complete. Now it is a bit more subtle, each churn event (membership change) means this group of Elders is a running transition set of

state → incomplete
state → complete

So each time we lost a peer for whatever reason we are at group size - 1, consensus is then based on majority of group size - 1 and >50% of age of valid voters. However to add a new peer we are also at group size - 1 so still incomplete.

After churn we are complete to do the network bidding, but at churn the section has an incomplete group, but the next event makes it complete. I hope that makes sense.

Each Block is effectively through state transition of the event type and valid voters locked in an order. This order can vary between two peers but will have the same start and end point due to quorum being used. i.e. you may see A being replaced by X and B being replaced by Y, I may see B being replaced by X and A being replaced by Y.

We both started with AB and ended with XY in an order cryptographically proven to be valid, however we took different routes to get there based on out of order messages and peers seeing events in different orders etc. It’s quite powerful.

So it is a chain of Blocks and the blocks contain a payload of an event type for routing sections. They may also have a payload of Data types (DataIdentifiers + Hash and that is where you start to see these blocks become very valuable for offline validation. There are a few nuances and we will discuss these as these deeper dives are published. It is actually rather neat, but many find it initially not possible. It then gets deep but well worth the journey, even if it kills me

mav · December 8, 2017, 10:38pm

Really excited to see this. Still absorbing it but I have some initial questions and nitpicks.

‘Complete group’ definition would be clearer if it used ‘adults’ instead of ‘age >4’ - that’s what the definitions are for after all.

I feel the order of terms should go from ‘simplest’ to ‘most complex’ but it starts with a fairly complex one. So something like starting with defining infant, adult, elder and then talking about how they combine then talk about exceptional cases etc… just makes it easier to read if it’s sequential by conceptual complexity.

I’m a bit unclear on the definition of neighbours.

Is that one bit by value, or one bit by length? ‘Differing in exactly one bit’ is not clear to me.

Are the following sections neighbours?

1000000001
1001000001

Does this mean sections are connected to prefix.length other sections?

How does neighbours work for imbalance of prefix length? Are these neighbours?

1000000001
100100000

1000000001
10010000000

Why does sorting matter? Is only one node relocated per valid block? Is sorting intended to bias relocation of younger nodes over older nodes?

Really glad to see this topic being discussed in further detail. It’s fascinating.

dirvine · December 8, 2017, 11:01pm

I wont detail this all here, but instead this link will answer this in much more detail (rfc)

Yes this is exactly correct, we only relocate one peer, if many have same age then we choose the closest to the hash of the churn event. We are still simulating the effects of various orders here (and a few other parts). This should be more complete by next update with the sim code.

I am really glad we have some community members with deep understanding and capability taking part in the conversation now. This is where we get the strength of the crowd on our side. Great so see.

rob · December 9, 2017, 12:54am

Yea, I was fully aware of this. Ever done OS writing and tried to estimate OS call usage Fun stuff.

But I was hoping that someone might have done some simulations or estimations based on a ball park figure from the alphas/testnets. And then pulled in figures like 1/10 of that and 10x and 100x and even 1000x that figure to get an idea of what might happen if one section experiences high event rates.

Excellent

Summary

imagine Burns saying that

tfa · December 9, 2017, 10:42pm

From the answer, I think there is a misunderstanding because I simply meant the term “Block” is too vague. IMO a better term should be chosen for a more readable specification. As it contains a network event with a vector of proofs, why not something like “EventProofs”?

But personally, I think that proof is too strong because it is only a claim signed by a peer. And the differences between a vote and a proof are only implementation details because both are related to 3 elements:

The payload
The peer id
The signature

It’s just that the peer id has been removed from the vote because it is already part of the message and payload has been removed from proof because it has been put in common in the so called “Block”.

So, votes and proofs are basically the same things and to simplify the specification, I would only keep only one of these notions in it, preferably votes because technically speaking they are not proofs.

For example, I would delete Vote definition, rename Proof to Vote and rename Block to EventVotes:

struct PeerId {
    age: u8,
    pub_key: PublicKey,
}

struct Vote {
    peer_id: PeerId,
    sig: Signature,
}

struct EventVotes<T> {
    payload: T,
    proofs: BTreeSet<Vote>,
}

I feel the same vagueness about “group size” expression because the question “what is a group size?” has been asked many times by the community. Additionally, this expression is becoming ambiguous because previously it only meant the data duplication factor but now it also means the number of elders in a section.

Even if you give the same value to these two quantities, as I explained in my previous post it would be clearer to use the right term depending on the specific context, which is here the number of elders.

dirvine · December 9, 2017, 11:51pm

I think this is because you are seeing only one use here. Its a generic struct over the payload. So the same pattern is used for message transfer where payload is a type of message, data storage and updates, where the payload is dataidentifier etc.

The reason for a Vote is that it is sent from one peer to another. The Proof is a proof of a vote in the Block. So the block contains proofs of peer votes, but the vote itself is a peer doing something. Kind alike the proof is what the Peer did where the Vote is it doing that thing, if you see what I mean. So the Vote has to say what it is voting for and the proof only needs the signature of the block payload and identifier of the voter.

Group size never only meant this, that is maybe a confusion by us, I am not sure. It comes from the Kademlia spec really, there you have group size (k), refresh, republish etc. we have gone a good bit away from pure kad, but this is where the group size comes from. It is selected in kad etc. as the number of nodes that cannot all go off line between a refresh.

Replication factor of data is not the same (although in kademlia it is ) and will likely not be on release, right now for testing it is, but if the group is secured using group size then the replication factor only need be as large or small as the number of nodes that cannot vanish before the network can act. That should be a relatively small number.

Here the elders are defined by the group_size, they are only elders because they are the X oldest adults where X is group size.

I hope that helps, I may be missing something fmr your comment though, sorry of I cam.

tfa · December 10, 2017, 3:29pm

This is the problem. Group size is too much overloaded:

Original definition in Kademlia spec as you rightly indicate
Data duplication factor
Min section size (name of parameter containing its value in routing config file)

Ok, that’s a move in the right direction that will distinguish 1 and 2.

I also accept that 2 = 3 because to be able to duplicate each data n times a section must have at least n nodes (but formally, this should have been 3 >= 2).

Now this expression has a 4th supplementary role, that is the number of elders in a section. And this is where I argue because it is too confusing and not maintainable to reuse group size expression.

Instead, to make the data chain spec clearer, you should talk about (target) elder count.

And a possible way to implement 2, 3 and 4 values could be:

define 2 parameters in config file, for example data_duplication_factor and target_elder_count
define min_section_size in code as max(data_duplication_factor, target_elder_count). This is to insure that 3>=2 and 3>=4.

dirvine · December 10, 2017, 3:34pm

This min section size will not be in the code though. Group size is all there is and it is only the number we decide for the number of Adults we wish to be Elders. I think it will be much clearer as data chains is implemented.

JPL · December 10, 2017, 10:47pm

OK here is my interpretation of the OP for the more technically challenged reader (like me), stepping back a little to add a bit of an overview. I’ve left out the ordering of the blocks as I still don’t understand that and one or two other details have been omitted too.

It seems to me that the purpose of datachains is more than simply tracking the state of Elders - that’s just one aspect of what they do, but the only one mentioned in the OP - and this is a cause of some of the confusion expressed in the comments afterwards - that and confusion over terms like ‘group’, it being a work in progress. Certainly getting a firm handle on things is not that straightforward, but it’s been fun learning and trying!

There are two things I’d really like to see. One is a succinct description (< 100 words) of what a datachain is and why it’s useful. I’ve rootled around but couldn’t find one anywhere. I remember reading a very short and simple description of a blockchain which made subsequent learning much easier, something like “a blockchain is an immutable ledger to which information can only be added, never changed or taken away. It is cryptographically secure and distributed across multiple machines, meaning that there is no need for a trusted central authority.” Could we do something similar for datachains?

The second is a description of how a datachain propagates across the network. I don’t really understand that at all.

Anyway, let me know if I’ve missed anything vital or got anything wrong.

A high-level look at consensus, node ageing and data chains

An autonomous network of decentralised nodes needs a set of rules to decide what is true and optimum, and what is potentially dangerous or counterproductive - since no human can do that for it. Because there is no central, universal source of ‘the truth’ (including current and elapsed time), such decisions, triggered by events on the network, must carried out by groups of nodes which reach a consensus by following rules as to what is and is not valid.

So, for example, a group of nodes might decide amongst themselves that a new member fulfils all the criteria required to join their number; conversely, they might decide that a particular node is acting suspiciously and should be thrown off the network. In another example, a group of nodes might decide that their number has become too small to ensure the required levels of security and that they should merge with a neighbouring group; and a large group of nodes might decide to split into two smaller entities.

Here, the word ‘group’ is used in general terms; as we’ll see, in the SAFE network it can have more specific definitions.

For a decision made by a group of nodes to be valid, a minimum number of them (a quorum) must agree. This quorum is set at more than 50% of the nodes. So, it there are eight nodes and five of them agree (by voting) on an event - say a new member trying joining the group is acceptable - then their vote will carry (the new member will be allowed to join) and their votes will be stored in a signed and secured block. If only four nodes agree then no action will be taken (in this example the node will not be allowed to join).

When it comes to decision making, not all nodes are equal. Some will have proven themselves to be trustworthy and to have sufficient resources (bandwidth, CPU, uptime), while others will be newer and/or less reliable. The concept of node_age gives a number to this reliability factor, ranging from 0 to 4. Nodes with a node_age of 4 are classed as Adults. They will have been moved around the network a few times and proven themselves to be reliable; they are trusted to vote on network events (e.g. allowing a new member to join). Nodes with a node_age < 4 are called Infants - they are not yet trusted to vote.

While Adults are capable of voting, in reality only the oldest and most trusted among them - the Elders - have that right. The Elders confer among themselves as to what is valid in their section. Only the results of their decisions (blocks) are shared with the Adults and Infants rather than the votes themselves. While they cannot vote themselves, Adults and Children are able to contribute by storing data chunks.

Sections

At any given time, each node on the SAFE Network is a member of exactly one section. A section is a group of nodes that is responsible for looking after data stored within a certain range of addresses on the network. The number of nodes in a section varies and fluctuates constantly through the process of churn, but at any time most sections will have between 10 and 20 nodes.

Within each section are a certain number of Elders, who are able to vote on events within the section, and also (usually - see below) a number of Adults and Infants. As well as being able to vote among their section peers (a process called local consensus), Elders within a particular section are also connected to Elders in sibling sections, near-neighbours as defined by their XOR address prefix. In this way, messages are passed across the network from Elders in one section to Elders in the next section, and so on.

If a particular section grows much larger than average (about 12 nodes), in general it will split into two smaller sections. (Note: in some circumstances, which we will not go into here, a split will not be possible.) Likewise, if, as a result of nodes leaving during ‘churn events’ the number of nodes drops below a certain level specified by a parameter called group_size, then its Elders will be triggered to seek a merger with a sibling group. At present, group_size is 8.

In an average section containing 12 nodes, there will be eight Elders plus a total of four Adults and Infants. In fact, eight (remember group_size = 8) is the minimum number of Elders that is permitted. So what happens if the number of Elders in a section drops below 8 as a result of churn, or an Elder being demoted or removed? Well, since an Adult also has node_age > 4, the oldest Adult node is simply promoted to the Elder type, i.e. it is given voting rights. If two Adults have the same age a ‘tie-breaker’ algorithm is deployed to decide which gets promoted.

A ‘complete group’ is a section containing a minimum of group_size Elders (we’ve been using 8). The most reliable of these Elders will be the least likely to be lost during a churn event. Infants, Adults and those Elders that participate least in voting or which are least reliable are liable to be moved off to other sections at any time. This is important for security reasons - it’s much harder for an attacker to control a section whose membership is in constant flux. But temporarily, prior to the completion of a churn event, sections may find they do not have group_size Elders, nor do they have sufficient Adults to promote to Elders. In this case, the group is defined as ‘incomplete’ and all section members influence node ageing.

Datachains

From the point of view of a section, Elders can exist in various states, including Live (active), Gone (moved to another section or offline) or Dead (banned). Changes in these states are recorded in a permanent ledger called a datachain, which is held by all members of the section. So, for example, if an Elder changes from the state Gone to Live, this will be recorded in two consecutive blocks in the datachain along with the cryptographic proofs of the voting process validating the event. Indeed, one purpose of the datachain is to give nodes a provable history on the network and to add resilience. The network can survive large-scale churn events or even a complete outage and be able to rebuild itself and reinstate the data according the records stored in the data chain. Another purpose is to ensure the integrity and durability of data by providing a cryptographically secure record of its identity, validity and location. Datachains should also improve the efficiency of the network by reducing the number of nodes that need to store data.

The states of Adult and Infant nodes, as verified by the Elders, are also stored, but since these nodes cannot vote and play a less important role in the continuity of the network this record is only temporary.

dirvine · December 10, 2017, 11:35pm

There is a little more to add. It may make it easier, or worse I never know. Anyway here goes.

There are several consensus mechanisms in play, the two main ones are

Running Network Consensus
Off-line consensus (data chain).

On the network a group can tell another group something and it will be believed. The other group is the next group on the binary tree (the xor space). So from A->B->C A cannot really tell C (usually) something as it’s not connected to it, so it tells B who is connected to it and can also see the membership changes etc, Then B tells C and so on. That is pretty much running consensus.

Then chain consensus where there is a progression of membership change events in the voting members of a section, this is the Elders as you point out (thanks). As each membership change happens it must be signed by the majority of the previous membership. This can be tightened to beyond majority where peers are penalised for not voting (but later, just take that as a future thing).

There is a lot that can happen now, so a client for instance on joining the network can connect to it’s close_group (which will also allow push notifications etc. so likely what will happen) and keep the last valid Block from that chain. On rejoin the chain in that section must progress form that Block forward, cryptographically secured. This amongst many other things allows us to know we are on the right network etc.

If the network restarts clients and peers do a similar thing, go back to the section they were in and things are again OK. Or they can be given data signed by the last known valid Block or a future generation of that (if they have been off for a while). This ignores the issue that many clients sign data so will be certain its their data via that route as well.

Back to running network consensus and chain consensus for a second. As the chain comprises unique blocks with unique membership at the section level we can do a trick. that is convert running network consensus into chain consensus. All we do is Vote as the current Elders on the event/data/upgrade etc. and we can create a Block for that event. It can sit on top of the chain signed by the most recent Elders and all is well in the world. So we can basically store any descriptor or data or pretty much anything like this. Converting to local (chain) consensus and locking it into the chain.

This trick also has another side effect, for a peer to join a section it can be given all data, in fact all recent Blocks and give he section Elders the Votes for all the Blocks. In terms of data it means they must get all the data and provide signatures of it to us. This replaces resource proof with a different work, this work is the network checking, can this peer get all this stuff and sign it. So then we know the peer is good and holds the data, it signed to say so. Now peers joining a section have worked hard, but have signed they have the data, so when we tell them to deliver data to a client and they do not, then there is no excuse, they should be punished (probably killed).

It goes on a good bit, but this is where I think we can start to really see the benefits of such a chain. Also as membership blocks are unique, we can prune the chain very efficiently. No need to grow forever.

There is a lot more but it is pretty exciting now to see many more folks getting involved in the thought experiments here.

happybeing · December 10, 2017, 11:44pm

Great post @JPL thank you.

I recall a post or two by David when he first had the idea which might be a basis for this. I may have saved them the links.

I haven’t tracked down the post I’m thinking of yet, but here is where Datachains / Data_chains were first discussed I think.

JPL · December 10, 2017, 11:54pm

Thanks David. Turning in now but I’ll read and digest tomorrow.

dirvine · December 11, 2017, 12:03am

No worries John, thanks a lot of the precis though, really great.

polpolrene · December 11, 2017, 11:43am

Good one John! Just gave it a 1 time read (the OP) but will certainly need more reads to get my head a bit around it. This really shows how complicated it is to make this whole idea of groups and splits and merges autonomous. Try to imagine how much easier this gets with “MaidSafe Trackers” that act as servers providing service to nodes and controlling all the sections and Vaults like BitTorrent etc. But that isn’t the idea is it ?

Quite amazing when this stuff is tested and active. It’s a bit like the Linux Kernel… It takes quite some time to create it and perfect it, but when it got stable it started to pop up everywhere because it’s so good.

chpio · December 11, 2017, 2:48pm

My guess is: when an elder dies it gets replaced by another “live” elder, thus the order.

Yeah, there is no way to reorder the datachain (that’s why its called “chain”).

mav · December 12, 2017, 6:23am

I have some questions and observations. The definitions are really dense and full of subtle implication; there must have been a lot of very detailed conversation to refine them into such solid little nuggets. I think many probably don’t see how much work these definitions represent, but it’s a seriously vast amount of work. They’re all intensely deep-meaning terms.

The primary sources I’ve been using are

The OP of this topic https://forum.safedev.org/t/data-chains-deeper-dive/1209

The Disjoint Sections RFC https://github.com/maidsafe/rfcs/blob/c63ff6636d0dc21d6939856cf3a32445c3855b8c/text/0037-disjoint-groups/0037-disjoint-groups.md

Bart’s simulation https://github.com/fizyk20/ageing_sim - a really excellent resource.

“Complete group: GROUP_SIZE peers with age >4 in a section.”

Is this ‘exactly groupsize’ or ‘at least groupsize’?

Confusingly, the simulation defines is_complete as exactly groupsize elders (which is confusing because elders may sometimes be age<=4). Furthermore, the simulation says all elders are adults (ie infant elders are treated as adults) but presumably only for the purposes of is_complete.

Some clarification of the definition would be helpful here.

“Elders: the GROUP_SIZE oldest peers in the section.”

Can the section have more than groupsize elders?

Is the tiebreaker only applied to network events, or is it also to elder membership?

I’m not clear on how to use the tiebreaker. I’ve reworded it in a way that makes sense to me.

Original:

“If there are multiple peers of the same age then XOR their public keys together and find the one XOR closest to it.”

My rewording:

If there are multiple peers of the same age then XOR their public keys together to get the public key xor distance (pkxd). Select the peer with the public key closest to pkxd (measured by xor distance).

Is the final ‘it’ in the original definition the same as ‘pkxd’ in the reworded definition, or is ‘it’ something else?

If ‘it’ is pkxd, the tiebreaker algorithm seems useless to me:

pk1 = 44225
pk2 = 98363
pkxd = pk1^pk2 = 77050
pk1d = pk1^pkxd = 98363
pk2d = pk2^pkxd = 44225

May as well just pick the largest public key value rather than do the xor distance comparison routine.

Can a formal definition for Network Event please be given? I suspect it is:

elder joins
elder departs (leaves network or relocated)
section splits
sections merge

See fizyk20/ageing_sim/churn.rs for the list of these events.

From the code it seems any joining or departing node (ie not necessarily an elder) also triggers a network event.

See add_random_node and drop_random_node for the triggering of network events for non-elder nodes.

Not a question, just a point of observation

Sibling: A section differing from us in the last bit.

It’s worth explicitly clarifying that a sibling may be multiple children sections of the sibling prefix. This means if the sibling section has already split, there are still sibling vaults (just that they’re formed by multiple sections). I feel the definition doesn’t cover this (but probably doesn’t need to).

A similar idea applies to neighbours.

More generally, any prefix may specify a single section or multiple sections. There may be a section with exactly that prefix on the network, or one with a shorter prefix, or multiple child sections with that prefix. I guess the point is there’s no such thing as an ‘invalid’ or ‘nonexistent’ prefix.

“When a section does not have a complete group, all oldest peers up to GROUP_SIZE are Elders and all section members influence node aging.”

What does ‘influence node aging’ mean? I understand an incomplete group may have infants for elders, but I don’t understand what that means beyond normal elder behaviour.

What is recd in handle_vote_recd?

Should I test for groupsize+buffer ‘elders’ or ‘adults’ in each hypothetical new section when testing for a split?

Likewise does a merge happen when a section has less than groupsize elders or adults?

The simulation says should_split depends on adults only (ie not elders) and same for should_merge.

The Disjoint Sections RFC is the closest I can find to a formal definition of split and merge, but does not explicitly deal with ageing or elders vs adults.

Is the following disallow rule correct? Where is it defined?

fizyk20/ageing_sim/section.rs#L198

“disallow more than one node aged 1 per section if the section is complete (all elders are adults)”

Superb writeup.

One nitpick: age can be more than 4. This matters because during merge the older nodes take precedence when deciding the elders.

bart · December 12, 2017, 3:45pm

Hi! Thanks for your interest in the proposal. I think I can say for us all at MaidSafe that we are impressed with how deeply you have read the document

Just to note - this document is still actively being discussed, so large parts of it might still change - but we will update the thread every time some changes happen.

Bart’s simulation GitHub - fizyk20/ageing_sim: Simulation of node ageing in SAFE Network - a really excellent resource.

I’ll take this opportunity to say: thanks! Glad to see it proved useful to you

Is this ‘exactly groupsize’ or ‘at least groupsize’?

Confusingly, the simulation defines is_complete2 as exactly groupsize elders (which is confusing because elders may sometimes be age<=4). Furthermore, the simulation says all elders are adults1 (ie infant elders are treated as adults) but presumably only for the purposes of is_complete.

Some clarification of the definition would be helpful here.

It is “at least groupsize”. The point is that we want to have all Elders being older than 4, but when the network has just started its operation, there won’t be enough peers with such an age. Once there are enough of them, we call it “having a complete group” and start operating a bit differently (I’ll explain more in the answers to your other questions below).

What the simulation does is it checks whether we have GROUP_SIZE Elders in the section (we can have less if we are just starting up and we have less than GROUP_SIZE peers in the whole section - we obviously don’t have a complete group then) and if yes (&& operator), whether all of them are adults (which is defined elsewhere in the simulation as being older than 4 - in the proposal we actually use the term “Adult” differently, see below). So effectively we check whether there are at least GROUP_SIZE peers in the section and whether the GROUP_SIZE eldest peers are older than 4 - which is how a complete group is defined.

Regarding the meaning of “Adult” - in the simulation it’s just a peer that is older than 4, period. In the proposal, we use it as meaning a peer older than 4 not being an Elder. This is a difference that might be confusing, and we probably should have a separate term for the meaning I used in the simulation to be more clear.

One more potential point of confusion is in the results printed by the simulation - “Adult” is actually used there in the meaning from the proposal… This will definitely have to be fixed

Can a formal definition for Network Event please be given? I suspect it is:

elder joins

elder departs (leaves network or relocated)

section splits

sections merge

This part has been changing quite a bit recently, so no wonder you are confused We use the term “Network Event” for anything that happens to the section and will be recorded in the Data Chain - which, as you correctly guessed, is mostly the 4 kinds you wrote.

The code is actually based on an older version of the proposal, where if some events need different treatment (like an elder leaving and an elder relocating), we considered them separate Network Events. The current proposal focuses on the effect an event has on the section, so both elder leaving and elder relocating are now Dead. This might get updated in the simulation, but it’s rather low priority, as it’s probably not really important to the results.

From the code it seems any joining or departing node (ie not necessarily an elder) also triggers a network event.

In a way, yes. We usually call an Adult or an Infant joining an event as well, but it might not be recorded in the Chain and might not influence node ageing. To be precise, Infants are only recorded in the Chain and influence node ageing while we don’t yet have a complete group, and Adults - in the proposal’s meaning, so not being Elders - are never recorded and always influence node ageing. I’ll elaborate on that node ageing part later in the post.

May as well just pick the largest public key value rather than do the xor distance comparison routine.

You are right, provided that the tie is just between two peers, and it can often be between more of them. But nevertheless, what you pointed out is a significant issue, so we will modify that part - thanks for that!

We will actually use a hash of the XORed names, then choose the one closest by XOR distance to that. So in your algorithm, this would mean something like pkxd = hash(pk1 ^ pk2), with the rest unmodified.

Not a question, just a point of observation

Sibling: A section differing from us in the last bit.

It’s worth explicitly clarifying that a sibling may be multiple children sections of the sibling prefix. This means if the sibling section has already split, there are still sibling vaults (just that they’re formed by multiple sections). I feel the definition doesn’t cover this (but probably doesn’t need to).

I think we actually don’t want to extend the meaning this far. The point of the sibling is to refer to a section that a given section will merge with, if it needs to merge - which will always be a section with a prefix of the same length and the last bit different. If such a section does not exist (which can happen, as you correctly state), we will have to wait until all the children of our sibling merge into the actual sibling section, and then merge with that section.

This might seem unnecessary, as we could just merge all the sections in one go - which we actually tried when initially implementing Disjoint Sections, but it turned out to be too problematic - bringing just two sections to consensus is challenging enough, let alone more of them

What does ‘influence node aging’ mean? I understand an incomplete group may have infants for elders, but I don’t understand what that means beyond normal elder behaviour.

This touches upon how node ageing actually works.

With node ageing, a node gets older when it is relocated. It is relocated when a network event hashes to a value ending with enough 0 bits. But which network events do we hash? The idea here is that since Infants will probably usually be nodes that only joined the network for a moment, we don’t want to take into account the events they generate - they are “noise” overlaid on the “signal” of Elders and Adults, so to speak

We can’t always ignore Infants, though - if we do that, then when the network starts and everyone is an Infant, noone will age! Every event will be ignored, as it was generated by an Infant, and so nobody will get older, so everyone will remain an Infant, so… you get the idea. We need to take the Infants into account at the beginning, but we don’t want to do it later - so we decided to mark the point of transition at the moment of getting a complete group. When we have a complete group, we decide that we have enough Adults and we don’t need to take Infants into account anymore, so we stop doing that.

Should I test for groupsize+buffer ‘elders’ or ‘adults’ in each hypothetical new section when testing for a split?

We never have more than GROUP_SIZE Elders, so it’s Adults (in the simulation’s understanding of the term; Adults plus Elders, if we exclude Elders from being Adults, as mentioned above).

Actually, we recently decided to allow the number of Elders to grow to GROUP_SIZE+1 briefly - but it’s still being discussed, and not reflected in the proposal. That being said, it’s Adults here, anyway

Likewise does a merge happen when a section has less than groupsize elders or adults?

Adults again, and it’s even when we have exactly GROUP_SIZE of them, not less.

If we only have GROUP_SIZE Adults, it means that all of them are Elders, which means we have no buffer if we lose another Elder - we would have to promote an Infant, then. So we decide to merge, which will give us the needed shot of fresh Adults from our sibling section

The simulation says should_split depends on adults only (ie not elders) and same for should_merge.

Yes - but bear in mind that for the purpose of the simulation, everyone with age > 4 is an Adult (so if some of them are Elders, they are still Adults, too).

Is the following disallow rule correct? Where is it defined?

fizyk20/ageing_sim/section.rs#L198

“disallow more than one node aged 1 per section if the section is complete (all elders are adults)”

I think we forgot to mention this in the document, so apologies for that - and thanks for pointing it out!
This rule is needed to avoid sections having huge numbers of Infants, which is actually a result from the simulation.

It wasn’t there initially, but because the age of peers grows roughly logarithmically with the number of churn events experienced by them, a single peer has to wait around 32 events until it becomes an Adult. This means that we will have 32 Infants for every adult in the section, which means over 800 Infants in a section that is about to split - clearly a huge number.

In order to prevent such situations, we decided to stop new age-1 Infants from joining when we already have such an Infant. This means that some of the Infants we already have will be relocated before a new one joins the network, so the number grows much slower. We still get 50-100 infants per section even with this rule, but this is actually manageable thanks to Infants only being connected to their own Elders and nobody else.

Also worth noting here is that this is again an example of a rule we can’t enforce from the beginning - the first node in the network would be an age-1 Infant, and so nobody else would be allow to join, then. This is why this rule is only enforced when we have a complete group, just like with calculating relocations.

One nitpick: age can be more than 4. This matters because during merge the older nodes take precedence when deciding the elders.

I’m not sure what you mean by that, but you seem to be mostly correct. Just to make it clearer - the age can indeed be up to 255 (but we don’t expect such ages to appear in the network, as they would require about 2^255 churn events, which is almost as many as there are atoms in the Universe ), but noone actually “decides” the Elders - they are always the oldest nodes in a section. So if you just mean that the older nodes become Elders first, then you are totally correct; and if you meant something else, well, I hope I clarified this a bit

JPL · December 14, 2017, 2:04pm

I realise that this is a work in progress but what exactly is going to go into the blocks of the data chain?

So we have the network events including

elder joins
elder departs (leaves network or relocated)
section splits
sections merge

But what about the data bit? Is each chunk of data, including mutable data, going to get its own block containing a pointer to that data, with a new block created every time the state is changed (for MD)? If so I should imagine pruning would be needed sooner rather than later to keep it to a manageable size.

Another thing I’m not quite clear on is the Running Network Consensus - so just the most recent block(s) of the chain get shared with the Elders in the sibling sections, is that correct? And presumably that would only be for section membership, rather than a record of the data looked after by that section? So will there be two datachains effectively, one for peers and one for data?

dirvine · December 14, 2017, 8:15pm

Yes, exactly.

Yes, there is no need to keep version - 1 in the chain at all. If we did then it would be a ledger type where every mutation is a new put payment (so we keep all old copies/transaction, like blockchain then).

Our approach to data and Adults & Infants is a pattern called bubble up. So each new Elder has to accept that data and vote for it again (like a kind of resource proof). So the Blocks for these things are always voted on by the most recent members of the chain (actually always voted by all members of the chain (elders)).

We have not published this part yet as we have not got 100% consensus in design of the minutia of data chains implementation, we are pretty close though. then we will implement phase I and publish phase II (data) which is actually alpha 4.