How to Build a Highly Available System Using Consensus - Butler W. Lampson

happybeing · October 11, 2020, 12:26pm

Abstract . Lamport showed that a replicated deterministic state machine is a general way to implement a highly available system, given a consensus algorithm that the replicas can use to agree on each input. His Paxos algorithm is the most fault-tolerant way to get consensus without real-time guarantees. Because general consensus is expensive, practical systems reserve it for emergencies and use leases (locks that time out) for most of the computing. This paper explains the general scheme for efficient highly available computing, gives a general method for understanding concurrent and fault-tolerant programs, and derives the Paxos algorithm as an example of the method. (link)

dirvine · October 11, 2020, 1:35pm

Yes, Paxos and raft etc. are good for consensus in trusted or leader based networks. The issues happen with bad actors/byzantine failures.

We recently had an in house presentation mentioning this. There we defined CFT (Crash Fault Tolerant) and the differences with BFT (Byzantine fault-tolerant). The former assumes all software running is valid and the latter cannot make that assumption. This is not well explained in such papers and the assumption all nodes are honest is also not clear. Well not clear enough IMO.

So we need consensus papers to define the tolerance they have is it CFT or BFT, it does make a big difference.