db-17 step 02 — Log replication

Goal

A leader accepts client proposals, replicates them to followers via AppendEntries, and advances commit_index once a majority's match_index covers the entry and the entry is from the leader's current term. Followers truncate any conflicting suffix and append the leader's entries. The result must be byte-deterministic across all three languages.

Tasks

  1. LogEntry. { term: u64, command: Vec<u8> }. Logs are 0-indexed in this implementation; the algorithm description uses 1-indexed in Ongaro's Figure 2 — adjust mentally when reading the paper.

  2. propose(cmd). Leader-only:

    • push LogEntry { term: current_term, command: cmd } onto own log,
    • set match_index[self] = log.len(),
    • broadcast AppendEntries to all peers,
    • call advance_commit() (so n=1 commits immediately).
  3. broadcast_append_entries(). For each peer in ascending id order, send AppendEntries { term, leader, prev_idx, prev_term, entries: log[next_index[p]..], leader_commit }. prev_idx = next_index[p], prev_term = log[prev_idx-1].term (or 0 if prev_idx == 0).

  4. AppendEntries handling on follower.

    • if term > current_term: become_follower(term);
    • if term < current_term: reply success=false;
    • reset election timer (we heard from a leader);
    • if prev_idx > 0 && (log too short || log[prev_idx-1].term != prev_term): reply success=false, match_index=0;
    • else: walk each incoming entry; truncate own log at the first (index, term) conflict; append remaining entries; advance commit_index = min(leader_commit, log.len()); reply success=true, match_index=prev_idx+entries.len().
  5. AppendEntriesReply handling on leader.

    • if term > current_term: become_follower(term);
    • if success: next_index[from] = reply.match_index + 1; match_index[from] = reply.match_index; advance_commit();
    • if !success and term == current_term: decrement next_index[from] (clamped at 0); next heartbeat / propose will retry with an earlier prev_idx.
  6. advance_commit(). For N from log.len() down to commit_index + 1:

    • if log[N-1].term != current_term: continue (Figure 8 safety);
    • if 1 + count(p : match_index[p] >= N) > nodes / 2: set commit_index = N and break.

Acceptance

Inline unit tests in each language:

  • propose_single_node_commits--nodes 1, propose 3 entries, every entry's term is the leader term, commit_index == 3.
  • append_entries_rejects_term_mismatch — leader with empty log sends AE with prev_idx=5, prev_term=1; follower returns success=false.
  • append_entries_truncates_conflict — follower with log=[(t=1), (t=1), (t=2)] receives AE with prev_idx=2, prev_term=1, entries=[ (t=3)]; resulting log is [(t=1), (t=1), (t=3)].
  • commit_requires_current_term — leader at term=5 replicates an old term=3 entry to all followers; commit_index does NOT advance past it until the leader appends a term=5 entry that also reaches majority.
  • quorum_commit_three_nodes — 3-node cluster, leader proposes one entry, one follower acks; commit_index advances (2 of 3 is a majority including the leader).

All five green in Rust, Go, and C++.

Discussion prompts

  • The Figure 8 commit restriction ("commit only entries of the current term") is famously subtle. Construct a 3-node scenario where omitting it lets a leader commit an entry that a future leader's election can erase.
  • Why does the leader update match_index[self] after propose? (Otherwise the majority check would always exclude the leader.)
  • What happens if two leaders coexist briefly (network partition that has not yet healed)? Specifically: which leader can advance commit_index, and why is this safe?