db-18 — Analysis

Required invariants

If any of these is violated, scripts/cross_test.sh will fail, and in the worst case the algorithm itself is unsafe. They are stated in the order it is easiest to reason about them.

  1. Promise monotonicity (P1). For every acceptor and every sim-time tick t, promised_ballot[t] >= promised_ballot[t-1]. The simulator enforces this with a single comparison on each of Prepare, Accept: the message's ballot must be >= promised_ballot before any state mutation. The Promise reply's accept_ok bit is the operational form of P1b.

  2. Accept respects promise (P2a). No acceptor ever stores accepts[slot] = (ab, v) with ab < promised_ballot. The Accept handler short-circuits with accept_ok=false when b < promised_ballot; the leader interprets that bit and steps down instead of advancing accept_count.

  3. Per-slot accept uniqueness under a ballot (P2b). For a fixed slot s and a fixed ballot b, the value v that any acceptor stores under (s, b) is the same value. This holds trivially here because only the leader of ballot b ever sends Accept(b, s, v), and its accepts[s] is set once and never overwritten under its own ballot.

  4. P2c (the safety lemma that needs work). Suppose value v is chosen at slot s under ballot b. Then for any ballot b' > b issued by any proposer, the value field of any Accept(b', s, v') will satisfy v' == v. The mechanism: to issue an Accept at all, the proposer must have collected promises from a quorum at ballot b'. That quorum intersects with the quorum that chose v at b. The intersecting acceptor saw v accepted under b, so its Promise carries (s, b, v). The proposer's recovery rule (take the value whose accepted_ballot is highest) therefore takes v (or a later value chosen under some b'' > b, but inductively that value is also v). So v' == v. QED. The simulator implements this rule in start_election's init of prepare_accepted and in the Promise-handler's if ab > prepare_accepted[s].ballot update.

  5. Decided-once / monotonic learn. Once learned[s] is set on any node, it never changes value. Locally enforced by reading before writing; globally guaranteed by P2c.

  6. Byte-determinism of the dump. Two runs with the same (seed, nodes, rounds, proposals, partition) produce identical canonical dump bytes on every language. This requires every iteration order (peers, slots, accepted-list inside Promise, heap pops on identical (time, sender, seq)) to be fixed. Drift here is what cross_test.sh catches.

Design decisions worth highlighting

  • promised_ballot is global per node, not per-slot. This is the Multi-Paxos optimization. A per-slot promised-ballot map would be more general (closer to single-decree Paxos per slot) but would cost a Phase-1 per slot. The global ballot lets one Phase 1 cover every present and future slot.

  • Phase-1 recovery walks every prior accept, not just the latest per slot. The Promise reply contains all of the acceptor's accepts (sorted by slot). The candidate folds them into prepare_accepted with take if strictly greater accepted_ballot. Per the proof of P2c this is the only correct rule; a "latest by receive order" tie-break would lose safety the moment Promises arrived out of order.

  • my_ballot.round is bumped to max(promised, my_ballot).round + 1 when starting an election, not just promised.round + 1. If this node previously won a higher ballot and stepped down due to a partition heal, it would otherwise re-issue its old ballot and immediately lose to its own historical promise. The max makes forward progress under churn deterministic.

  • Leader-pick rule: lowest-id Leader. When the simulator must inject a client proposal, it picks the lowest-id node currently in role Leader. There may be zero (queue the proposal in cluster_pending) or, briefly, two stale Leaders (the lower id wins; the other's Accept will fail at acceptors that have already promised the new ballot). Determinism > realism here.

  • drain_pending runs on every Accepted, not just every tick. In single-node mode (--nodes 1) the leader becomes its own quorum and decides slots inside the broadcast loop. Doing the drain in become_leader and in try_decide means scenario D's hash is independent of how the simulator orders ticks.

  • Heap key (delivery_time, sender, seq). db-16's invariant. Without the seq tiebreak, Promise messages from two acceptors arriving on the same tick from the same sender (impossible by construction, but the type system doesn't know that) would be reorder-able across languages.

  • Role enum order. Follower=0, Candidate=1, Leader=2 was chosen to match db-17; any change would propagate into the dump byte at offset 12 + 16 = 28 per node, which would silently invalidate scenario A's canonical hash.

Tradeoffs worth flagging

  • Concurrent proposers cost throughput, not safety. Two proposers in dueling Phase 1 can ping-pong each other forever in principle. The lab dodges this in two ways: (a) the deterministic simulator can't sustain a livelock because election timeouts are PRNG-jittered per node-id, and (b) once a leader is elected, the election-timer reset on Heartbeat keeps it elected. Production systems add leader leases (Chubby, Spanner) to push the worst case down further.

  • No commit-only-current-term subtlety. Raft has Figure 8: a newly-elected leader must commit something in its own term before it can ack older entries, otherwise they can be silently overwritten. Paxos sidesteps the problem because P2c forces a new leader to re-Accept any recoverable value under its own ballot; there is no "shadow commit" to retract. The price is the Phase-1-on-every-startup cost.

  • No native log compaction. This lab's accepts and learned grow unboundedly. A real Multi-Paxos system snapshots a state machine and discards accepts below the snapshot index (see Spanner, Chubby, db-23). Adding snapshots here would require exposing a committed_through watermark in the dump.

  • No membership change. n is fixed at Cluster construction time. Vertical Paxos (Lamport/Malkhi/Zhou 2009) is the textbook way to add this. db-23 covers it.

  • Three languages is more work than two. Two languages prove the spec is unambiguous. Three rules out the case where you and your collaborator have committed the same misreading. C++'s std::map and Rust's BTreeMap agreeing with Go's explicit sort.Slice was the only thing that caught a misordered Promise payload in scenario B during development.

Why three languages

Same answer as db-17: the constraint forces the spec to be a spec and not a habit. Sorted-iteration discipline, fixed enum order, little-endian fixed-width integers, no map iteration on the wire — these are easy to get away with in any single language, and the only way to surface them is to ask "would another implementation make the same choice without being told?". For Paxos the question matters even more: the algorithm is sensitive to whether the highest-ballot prior accept is chosen during recovery, and a sort-order bug would make safety stochastic, which is the worst possible failure mode.