db-17 step 02 — Log replication
Goal
A leader accepts client proposals, replicates them to followers via
AppendEntries, and advances commit_index once a majority's
match_index covers the entry and the entry is from the leader's
current term. Followers truncate any conflicting suffix and append the
leader's entries. The result must be byte-deterministic across all
three languages.
Tasks
-
LogEntry.{ term: u64, command: Vec<u8> }. Logs are 0-indexed in this implementation; the algorithm description uses 1-indexed in Ongaro's Figure 2 — adjust mentally when reading the paper. -
propose(cmd). Leader-only:- push
LogEntry { term: current_term, command: cmd }onto own log, - set
match_index[self] = log.len(), - broadcast
AppendEntriesto all peers, - call
advance_commit()(so n=1 commits immediately).
- push
-
broadcast_append_entries(). For each peer in ascending id order, sendAppendEntries { term, leader, prev_idx, prev_term, entries: log[next_index[p]..], leader_commit }.prev_idx = next_index[p],prev_term = log[prev_idx-1].term(or 0 ifprev_idx == 0). -
AppendEntrieshandling on follower.- if
term > current_term:become_follower(term); - if
term < current_term: replysuccess=false; - reset election timer (we heard from a leader);
- if
prev_idx > 0 && (log too short || log[prev_idx-1].term != prev_term): replysuccess=false, match_index=0; - else: walk each incoming entry; truncate own log at the first
(index, term)conflict; append remaining entries; advancecommit_index = min(leader_commit, log.len()); replysuccess=true, match_index=prev_idx+entries.len().
- if
-
AppendEntriesReplyhandling on leader.- if
term > current_term:become_follower(term); - if
success:next_index[from] = reply.match_index + 1; match_index[from] = reply.match_index; advance_commit(); - if
!successandterm == current_term: decrementnext_index[from](clamped at 0); next heartbeat / propose will retry with an earlierprev_idx.
- if
-
advance_commit(). ForNfromlog.len()down tocommit_index + 1:- if
log[N-1].term != current_term: continue (Figure 8 safety); - if
1 + count(p : match_index[p] >= N)>nodes / 2: setcommit_index = Nand break.
- if
Acceptance
Inline unit tests in each language:
propose_single_node_commits—--nodes 1, propose 3 entries, every entry's term is the leader term,commit_index == 3.append_entries_rejects_term_mismatch— leader with empty log sends AE withprev_idx=5, prev_term=1; follower returnssuccess=false.append_entries_truncates_conflict— follower withlog=[(t=1), (t=1), (t=2)]receives AE withprev_idx=2, prev_term=1, entries=[ (t=3)]; resulting log is[(t=1), (t=1), (t=3)].commit_requires_current_term— leader atterm=5replicates an oldterm=3entry to all followers;commit_indexdoes NOT advance past it until the leader appends aterm=5entry that also reaches majority.quorum_commit_three_nodes— 3-node cluster, leader proposes one entry, one follower acks;commit_indexadvances (2 of 3 is a majority including the leader).
All five green in Rust, Go, and C++.
Discussion prompts
- The Figure 8 commit restriction ("commit only entries of the current term") is famously subtle. Construct a 3-node scenario where omitting it lets a leader commit an entry that a future leader's election can erase.
- Why does the leader update
match_index[self]afterpropose? (Otherwise the majority check would always exclude the leader.) - What happens if two leaders coexist briefly (network partition that
has not yet healed)? Specifically: which leader can advance
commit_index, and why is this safe?