Data Chains - Option B

maidsafe · May 24, 2017, 2:50pm

Partially Ordered Chains

Introduction

There is a tension between each node’s subjective view of the network (who is connected to me at the moment?) and the “objective” eventual agreement the network needs to reach about its own structure (and, later, data). This is an attempt to get from the former to the latter without a multi-round consensus algorithm, by instead voting on state transitions and accepting that during churn, more than one state could be valid at the same time. That means that even without splits and merges, a block can have more than one predecessor or successor, and the chain segment for any given address is not necessarily totally ordered.

Specifically, votes are signed edges in a directed graph, whose vertices, or blocks, are section states: prefixes and member lists of particular sections. An edge accumulates once a majority of its source’s members have voted for it, and it then makes its destination valid, too, given that the source was valid.

Another kind of edge points from every state to its potential successors. The valid blocks from which no such edge points to another valid block represent the current state of the network, and whenever a node observes a new state, it signs votes pointing from current states to the observed one.

Votes and blocks

The core of this algorithm is essentially a function that assigns to a set (unordered!) of Votes a set of valid and a subset of current Blocks that represent prefixes and member lists that were valid at some point in the past, and prefixes and member lists that are considered to be currently valid under the assumption that no Votes are missing. In particular, adding new Votes to the set will only cause more blocks to become valid.

/// A record of the state of a section at some point in time.
struct Block {
    /// The section is responsible for all addresses that match this prefix.
    prefix: Prefix,
    /// If the same prefix and members occur twice, the version number
    /// will distinguish the blocks.
    version: u64,
    /// Section members and their vote weights.
    members: Map<Name, u64>,
}

/// A testimonial by the `signatory`: It votes as part of the block `from`
/// for the block `to`.
struct Vote {
    /// The block that the `signatory` is confirming.
    to: Block,
    /// A block that was valid at the time of this vote.
    from: Block,
    /// The signature of the above.
    signature: Signature,
    /// The name of the signing node.
    signatory: Name,
}

A vote can be understood as a testimonial by a member of vote.from that vote.to has been observed. If a quorum of from agree, these signed votes are a proof that to is valid.

A quorum with respect to members: Map<Name, u64> is a set q: Set<Name> such that

q.len() * 2 > members.len() &&
members.iter().map(|(m, w)| if q.contains(m) { w } else { -w }).sum() > 0

i.e. q has a simple majority both in terms of vote weights and node count.
A set of votes: Set<Vote> has a quorum with respect to members, if the set of signatories does.

A block b1 is admissible after block b0 if b1.version > b0.version and

either (the split case)
- b0.prefix == b1.prefix.popped() and
- b1.members contains exactly those entries of b0.members that match b1.prefix,
or (the merge case)
- b0.prefix.popped() == b1.prefix and
- b0.members contains exactly those entries of b1.members that match b0.prefix,
or (the add and remove cases)
- b0.prefix == b1.prefix and
- b0.members and b1.members differ by exactly one entry.

A block b0 witnesses block b1 with votes votes: Set<Vote> if all of the following conditions are satisfied:

for every v in votes, v.from == b0 && v.to == b1
either:
- b1 is admissible after b0, or
- b1.prefix and b0.prefix are neighbours
the votes have a quorum with respect to b0.members, or, in the case that b1 removes a node from b0, the votes have a quorum with respect to b1.members

From the perspective of a single node, we say that block b0 witnesses block b1 if there is such a set of votes received by that node.

By requiring a quorum of b1.members when b1 is a remove block, it is slightly more likely that node removals accumulate during periods of rapid node loss. For example, if there are 10 nodes in b0.members, 6 would ordinarily be required for a majority quorum, whereas b1.members will contain 9 nodes, of which only 5 are required for a majority quorum.

Valid and current blocks

Given a set of initial known-valid blocks, e.g. the genesis block or a set of blocks from another trusted node, we use the received votes to validate blocks:
The set of valid blocks is the smallest superset of the initial blocks such that every block witnessed by a valid block is also valid.

A set of blocks C buries a block b if b.prefix is covered by the prefixes in C with a higher version, i.e.: for every name n with b.prefix.matches(n), there is a c in C with c.version > b.version && c.prefix.matches(n).

A prefix p is an ancestor of a prefix q if p is compatible with and shorter than q. Then q is a descendant of p.

A block is a candidate if it is not buried by any valid block. A block b is current if it is a candidate and there is no other candidate c such that:

either c.prefix is an ancestor of b.prefix,
or c.prefix == b.prefix and c.members.len() > b.members.len()
or c.prefix == b.prefix and c.members.len() == b.members.len() and c.members > b.members.

That last inequality is lexicographic comparison of the iterators but it does not really matter: It is just a tie-breaker to ensure that there is always only one current block per section. In fact, if the valid blocks cover the namespace (e.g. if there is a genesis block with the empty prefix), it is guaranteed that for every name n, there is always exactly one current block whose prefix matches n: The blocks with the highest version whose prefix matches n are guaranteed to be candidates. The candidates with the shortest prefix matching n are not ruled out by point 1. Among those, the ones with the most members are not ruled out by point 2. And among those, the one with the lexicographically greatest member list is current.

The set valid_blocks: Set<Block> can be computed by recursively adding every block b1 to it for which there exists a block b0 in valid_blocks and b0 witnesses b1 with some subset votes of the received votes.

What to vote for

In general, a node votes for blocks that are admissible after current ones, and that are closer to what the node itself is observing. E.g. if it lost a connection, it will remove that node from current blocks and vote for the results.

At the same time, it votes for neighbouring blocks that it has already verified as valid: the horizontal witnessing of neighbours is complementary to the vertical witnessing of admissible successors, so that together they define a web of trust linking the blocks of the whole network together in a way that can be used both for proving facts about the network’s history and validating messages.

Adding nodes

For every current block b0 containing our own name but missing the name of a node that should be added, vote with from == b0 for a block with b0.version + 1 and with name in addition to b0.members. Nodes that should be added are candidates that passed our resource proof within the last 60 seconds (Add rule).

The weight of name is according to Node Ageing, i.e. 0 for a newly added node and a + 1 for a relocated node that previously had age a.

Removing nodes

For every node name that should be removed, and for every current block b0 containing both name and ourselves, vote for a block with b0.version + 1 and b0.members without name, using from == b0. Nodes that should be removed are:

nodes that have disconnected from us and we haven’t managed to reconnect or establish a tunnel to yet, and (RmDc rule), and
nodes that sent us malformed or inherently invalid messages or where we strongly suspect malicious behaviour (RmFail rule).
The details about what constitutes malicious behaviour need to be worked out in detail and are only partly related to this proposal.

Splitting

For every current block b containing our name, let b0 and b1 be blocks with b.version + 1 , the children of b.prefix and the corresponding subsets of s.members.

If b0, b1 and all current blocks whose prefixes are siblings of ancestors of b0.prefix have at least MIN_SECTION_SIZE + SPLIT_BUFFER members, vote for b0 and b1 with from == s (Split rule).

Merging

For every current block b0 containing our name, and every current block b1 such that b0.prefix and b1.prefix are siblings, if at least one of the following conditions is satisfied, vote for b with from == b0, where b.prefix == b0.prefix.popped(), b.members is the union of b0.members and b1.members and b.version is max(b0.version, b1.version) + 1:

b0, b1 or any current sibling of any ancestor of b0.prefix has fewer than MIN_SECTION_SIZE members (Merge rule),
or we disconnected from too many members of b0, b1 or any current sibling of any ancestor of b0.prefix, so that the ones we are still connected to don’t have a quorum (ForceMerge rule).

Neighbouring sections

For every current block b1 whose prefix doesn’t match our own name, and every current block b0 containing our own name, vote for b1 with from == b0 (Neighbour rule).

This is to witness all neighbouring sections, and it replaces the section list cache.

Admissible valid blocks

Vote for b1 with from == b0 if (Admissible rule):

b0 and b1 are both valid,
b0 contains our name,
b1 is admissible after b0,
b0 does not yet witness b1 and
there are no blocks c in between b0 and b1 such that c is admissible after b0 and b1 is admissible after c.

This is essentially a shortcut to prove b1 from b0 to another node without taking a detour via other blocks. In particular, if our sibling section wants to merge with us and has successfully made the merged section valid, this rule will cause us to sign the merge from our side of the section, too.

Relevant blocks

Every node has to know the current blocks for its own section and its own section’s neighbouring sections. Therefore, when a node observes a split, it should drop all non-neighbouring blocks from the set of current blocks. Any existing votes and valid blocks pertaining to a non-neighbouring section should be retained – the node should just stop trying to stay up to date with that part of the network.

Conversely, when a node observes a merge, it might gain new neighbours from a hitherto unknown section. The message flow is such that nodes receive merge votes from their new neighbours, and can then use the Proof request algorithm to fetch the rest of the votes. See the Relaying votes section below for details.

Relaying votes

Conceptually, there are two types of messages that we send in order to inform other nodes about our current state:

Vote Messages: a message containing our vote on some change.
Agreement Messages: a message containing at least a quorum of votes on a change, indicative of our agreement to the validity of a new block.

In the implementation, both such messages can be represented by a Vote and a collection of signatures, like so:

struct SignedVote {
    vote: Vote,
    signatures: BTreeMap<PublicId, Signature>,
}

Whenever we wish to vote on something, we broadcast a vote message to the members of the from and to blocks. In most cases, this means sending votes only to nodes in our own section. When witnessing, the to block is that of a neighbour, so we broadcast the vote to them as well. New candidates receive the vote that makes them a member, and can use the Proof request algorithm to retrieve the rest of the history.

Whenever a vote accumulates, and we are in either or both of the from or to blocks, we broadcast the vote to any current neighbour section N such that:

N is a neighbour of the from or to prefix, and
Our current prefix is the closest (by XOR distance) to N's prefix, amongst the set of current sections which have prefixes compatible with either the from prefix or the to prefix.

When taking the XOR distance between two prefixes, we use the lower bounds of both prefixes, so that for example Prefix(01) ⊕ Prefix(00) = XorName(010000...) ⊕ XorName(00000...) = 0100000....

Message Complexity

Let n be the number of nodes in a given section S, and m be the number of nodes in the neighbourhood of S (consisting of S and its neighbours).

Each new block that is agreed requires O(n) nodes to vote, and each node broadcasts their vote to O(n) other nodes. Hence the message complexity is at least O(n²).

Further, broadcasting the agreement message to the neighbourhood requires that O(n) nodes broadcast to O(m) other nodes, for a contribution of O(nm). We have m ≫ n, so O(nm) ≫ O(n²), and hence the total message complexity is approximately O(nm) per block agreed.

Proof request algorithm

The proof request algorithm can be invoked by a node when it receives a message about an unknown block. We’re fairly sure that there exists an algorithm which works most of the time, but we are still working out the details.

The current algorithm

The algorithm that is currently implemented in Ewok works the following way:

If we receive a vote or agreement with a from block we don’t consider valid, we reply with a RequestProof message to the sender. This message contains the block in question, as well as a set of blocks we consider current at the given moment.
Upon receiving a RequestProof message, we check whether the requested block is valid according to us. If not, we reply with NoProof. If yes, we proceed to search for a proof.
We conduct the search for a proof, which is essentially a Breadth-First Search in the graph of blocks that are connected by votes:
- Start with a single “path” in the graph, containing just the requested block.
- For each path: let b be the last block in that path. Find all blocks x such that there are accumulated votes from x to b. For every such x that we haven’t yet visited, generate a new path by appending x to the initial path.
- If there is a path among the generated ones, such that its last block is either among the current blocks from the RequestProof message, or is for a compatible prefix and a lower version than a block among the current blocks, stop.
- If there is no such path, and we haven’t extended any of the paths in the previous step, stop.
Reply with all the votes along the path satisfying the condition above, or with the immediate predecessors of the requested block if no such path exists.

Changes in the current blocks

With each new signature, the set of valid and current blocks may change. The rule for the routing table is that whenever there exist current blocks b0 and b1 which are compatible or neighbours, all members of b0 and b1 need to try and directly connect to each other, and need to have proofs for each other’s current blocks.

Whenever we learn about new votes in a way such that we now have two current blocks b0 and b1 that are neighbouring or compatible, we inform both of them about each other (see Relaying votes), so they will know whom to connect to.

After each change, disconnect from all nodes that no current block implies we need to be connected to anymore.

Secure hop messages

Nodes that relay a message need to insert Votes into it in such a way that every member of the receiving group or section can verify the message, i.e. such that the votes create a chain of witnesses relations from the receiver’s block to a block of the sender a quorum of which has signed the message itself. In detail:

If a node sends a section message on a route (i.e. after route failures to receive an Ack), it sends its signature of the message to the accumulating node, i.e. the route-th member, sorted by distance from the destination, of every current block containing the node’s own name.

If the node is the accumulating node itself, it waits until it has collected signatures from a quorum of at least one valid block b0. Then it sends it on to the route-th member of every current neighbouring block that lies in the right direction (i.e. has a longer prefix in common with the destination address, and agrees with the destination address in all bits that don’t exist in our prefix). Since we immediately broadcast votes that make blocks valid, that node will already know the block b0. We append b0, too.

To relay a message, insert votes that (over one or more steps) validate the previous hop’s block from one of our own current blocks. Append that block.

Repairing broken sections

If too many members of a section go offline, the sibling will vote for a merge, and since the sibling’s votes alone are already enough, this automatically force-merges the section.

Optimisations

Nodes occasionally need to apply the block validation algorithm to blocks that are not neighbours, to validate messages. After validating a message, they can drop all blocks again that are not compatible or neighbouring to them.

Nodes can even drop all blocks that have not been current for some period, as it is unlikely that they will ever be used as a source (from) again. However, if a client or node rejoins after being offline for longer than that period, the network won’t be able to prove to that client or node anymore that it is the successor of the previous network.

If n events happen roughly at the same time it is not unlikely with the current rules that a section will go through 2ⁿ versions of itself before reaching agreement again (all combinations of those events). This could be avoided by introducing short delays (maybe just one or two seconds) between our own votes in the cases where we would currently send multiple votes instantaneously: If we always start with the lexicographically lowest vote, then it is likely that some branches accumulate before we consider voting on others, and we wouldn’t exhaust all the possible combinations.

Broadcasting votes

Broadcasting all votes that we received is wasteful since most recipients will already have them. We could be more optimistic here and e.g. just send the now valid blocks. If a node doesn’t agree that they are valid, it can request the missing votes:
So whenever a new block becomes current for a node n, it sends that block b in a message CurrentBlock(b) to all known nodes that are neighbouring or in that block. When a node m receives that, there are three different cases:

If b is current in m's view, too (the normal case), it doesn’t do anything.
If b is valid but not current for m, m sends the votes from b to its own current block(s) with compatible prefix back to n: n must still be missing some of those votes.
If b is not valid for m, m sends a CurrentBlock(c) with its own current blocks back to n. n will then respond with the missing votes.

Examples

Let’s assume we have a constructor:

fn new<I>(prefix: &str, version: u64, nodes: I) -> Block
    where I: IntoIterator<Item = &Name>

Add and remove

Block b0 with nodes 0, …, 4 agrees that it is valid and current. Now node 5 joins and at roughly the same time, node 4 leaves. Consider the following section states:

let b0 = Block::new("", 5, &[0, 1, 2, 3, 4   ]);
let br = Block::new("", 6, &[0, 1, 2, 3      ]);
let ba = Block::new("", 6, &[0, 1, 2, 3, 4, 5]);
let b1 = Block::new("", 7, &[0, 1, 2, 3,    5]);

Nodes 0, 1 first vote for ba with from == b0. Everyone receives their votes.

Next, nodes 2, 3 see 4 leaving and vote for br with from == b0. Everyone receives their votes. So far, nothing has accumulated.

Now nodes 0, 1 see 4 leaving, too, and vote br with from == b0. They now have proof that br succeeds b0. The former is now current, the latter isn’t anymore.

Before receiving those votes, however, nodes 2, 3 see node 5 passing the resource proof, too, and vote for ba with from == b0.

Everyone exchanges their votes and has now two candidate blocks ba and br. Since ba has more members, it is current and br isn’t (although it remains valid).

Since they all are still disconnected from 4, they vote for b1 with from == ba.

As soon as that accumulates, b1 becomes the only current block.

Merge

The block b00 with prefix 00 wants to merge with block b01 with prefix 01, because block b01 has too few members. The members of b00 sign the block b0 with prefix 0, with the union of the members of b00 and b01, and from == b00.

Having a valid block b0 now, all members of b01 need to learn about and connect to b10. Since b0 is valid and has a higher version number than b01, b01 is now not current anymore, regardless of whether b01 also voted for it.

AndreasF · May 31, 2017, 9:18am

We updated the MergeConv rule formulation to make it clearer that it doesn’t require any single block to exist for 60 seconds.
We also fixed an omission from the Split rule.

AndreasF · June 1, 2017, 3:01pm

There was another situation in which the MergeConv rule was insufficient: If e.g. we had blocks with prefix 0 and version 5, prefix 00 and version 5 and prefix 01 and version 4, then only the first two would be current and there would be no current block to merge 00 with.
Also, we found a scenario where a node could get a newly joined peer removed from the network again by voting for an old block.

Both these issues should be resolved by the new definition of “current”, which we just updated: It makes the MergeConv rule obsolete, gives precedence to larger sections and in general reduces the number of current blocks per section and therefore the number of votes and messages.

AndreasF · June 6, 2017, 10:28am

We updated the “current” definition again: It reduces the number of current blocks even further, so that by definition there is only one per section. This prevents a combinatorial explosion of “conflicting” current blocks, and at the same time makes the RmConv rule obsolete.

AndreasF · June 13, 2017, 12:26pm

The ForceMerge rule now applies in more cases, because if we see section 0 lose too many nodes and we are 10, then we should merge — first with 11 and then with 0 —, even though 0 is not our sibling (yet). In other words, it should exactly mirror the Merge rule, but in addition to agreed blocks use our local view of the network.

The “Broadcasting votes” section is a first attempt to make more precise how to recover if nodes didn’t receive all votes. So far, the proposal mostly assumes that eventually, every vote is delivered to every node that needs to know about it.

dirvine · July 18, 2017, 9:02pm

I do not think this is right is it? The quorum of from will not be a true quorum. It will require to be the quorum of to. I thought we had already agreed this, but may be wrong. I think @viv spoke about this in a hangout a few weeks back.

AndreasF · July 19, 2017, 5:59pm

To me, from sounds like the more natural and maybe slightly safer choice:
If e.g. a section has 4 good and 4 bad nodes, then the bad nodes can’t validate a new block. In the same situation if the quorum were checked against to, they could remove a good node or add a fifth bad one and then have a majority in the to block.
Also, for witnessing votes it has to be from anyway, so it seems more natural to just always use from.
(But neither of these is a strong reason, of course; just my 0.02 MAID.)

dirvine · July 19, 2017, 8:13pm

Yes, I think that’s where the convo is going now, but use the lost node to reflect group size in from as G-1 in that case. It is more natural I think, but we just need to make sure the counting rules (for G and consequently Q) we use reflect actual possible values.

Nice to hear from you man, hope all is going well for you and you stay in touch, we will try and make those 0.2 MAID worth at least a beer at some stage

Stark · July 19, 2017, 9:06pm

Awww snap The big dog came thru Glad to see you’re still in it to win it. Not as active but still shepherding us into a SAFE future. Trust me friend we highly value your time with us safe folk.

Keep letting those mental bombs go off, we def need you here. Peace.

michaelsproul · July 21, 2017, 1:10am

We have updated the proposal above to include some details about how nodes are able to hold only the portion of the chain that is relevant to their section and its neighbours. See the Relevant blocks heading for details. In the process of figuring this out, we also decided on the message sending strategies that nodes must use to maintain consensus in the presence of merges and splits. These messaging rules are discussed under the Relaying votes heading.

In some cases where nodes become neighbours because of a section merge, we rely on a new Proof request algorithm which allows, for example, section 11 to request proof from section 01 (its neighbour) of the legitimacy of a vote made by section 00 (not its neighbour). We hope that the proof request algorithm can also act as a safety net in other cases where nodes become out-of-date, although we need to make some adjustments to the algorithm to make it work in more cases.

Finally, with regards to @dirvine’s point about using the to block as the basis for the quorum we are considering amending the rules so that only in the case of node removals do we do exactly this. This change effectively prohibits a node from voting on its own removal, and in the process makes it slightly more likely that a remove block will accumulate during periods of rapid node loss.

michaelsproul · August 3, 2017, 3:42am

The spec has been updated with 3 changes:

Detail on using the to block for removals.
An update to the vote broadcast algorithm. We discovered some corner cases that broke the previous algorithm. The crux of the issue was that in some cases involving merges, nodes in 10 wouldn’t broadcast any votes to 0 because they would accumulate 11's vote to merge, and believe that 11 would already have sent their votes to 0. If this also happened in reverse, with 11 receiving 10's vote to merge and not broadcasting it, in the end nobody would broadcast a merge vote to 0. We have soak-tested the new algorithm and haven’t found any failures, but we may make further adjustments if we find more corner cases (and we might try to write a formal proof of correctness).
An update to the proof request algorithm. In the case where the requested block is buried by one of our current blocks, previously the responding node would reply with a NoProof response. Now, it will reply with the immediate predecessors of the requested block, and the requesting node can use these to continue traversing back through the history.