db-19 step 01 — Epoch, zxid, and Fast Leader Election
Goal
Build the persistent state every ZAB node carries and the election protocol that picks the next leader when no one is currently broadcasting. Election must converge in bounded ticks for any quorum-available network, and the chosen leader must always be the node with the highest committed zxid in the surviving quorum.
Tasks
-
ZxId { epoch: u32, counter: u32 }with lexicographic ordering (epoch first, then counter). Provide aZxId::ZEROconstant. Every zxid comparison in the rest of the protocol uses this ordering — never compare theu64representations directly, because the lab keeps them as a struct for clarity. -
Persistent node state. A
ZabNodecarries:id: u32,n_nodes: u32,quorum = n_nodes/2 + 1.role: Role(Looking/Following/Leading).current_epoch: u32— the epoch of the leader we last followed. Bumped onNewLeader.accepted_epoch: u32— the epoch we promised onNewEpoch. Always>= current_epoch.history: Vec<Txn>— committed and uncommitted txns in zxid order.last_committed: ZxId— high-water mark; entries<=this have been applied.
These are the four values that would survive a crash in a real implementation. Everything else (vote tables, ack tables) is transient and rebuilt on the next election.
-
Rpc::LookForLeader / Vote. ALookingnode broadcasts its current vote each tick. On receiving a peer's vote, update viaupdate_vote(peer.last_zxid, peer.id):- Adopt
peeras our vote target if(peer.last_zxid, peer.id) > (current_vote.last_zxid, current_vote.id). - Record the peer's choice in
vote_view[voter_id] = leader_id.
- Adopt
-
Quorum detection. Walk
vote_viewand count entries whose value equals each candidate id. The first candidate (in id order) whose count>= quorumbecomes the elected leader. If that leader is us, transition toLeading; otherwise transition toFollowingwithleader_id = Some(...). -
Election timeout. Track
election_deadlineper node. Ifnow > election_deadlineand we're stillLooking, reset the vote table and broadcast a freshLookForLeaderfrom our current(last_zxid, id). Reseed the deadline withELECTION_TIMEOUT_MIN + splitmix64(seed) % ELECTION_TIMEOUT_SPAN.
Acceptance
Inline unit tests in each language. Names below are the Rust form;
Go uses TestZxIdOrdering style, C++ uses test_zxid_ordering:
zxid_ordering_is_lexicographic—ZxId{0,9} < ZxId{1,0},ZxId{1,0} < ZxId{1,1},ZERO < ZxId{0,1}. Locks the comparator.vote_adopts_higher_last_zxid— node 0 withlast_zxid=(1,5)votes for itself; receives a vote from node 1 with(2,0); adopts node 1. Then receives from node 2 with(2,0)— does not re-adopt (tie on zxid, lower id loses).quorum_of_votes_elects_highest— in a 3-node cluster all voting for node 2, node 2 transitions toLeadingafter the second matchingVotearrives.election_does_not_decide_in_minority— partition isolates node 0 from {1,2}; node 0 must never leaveLookingregardless of how many ticks elapse.
All four green in Rust, Go, and C++.