Step 03 — Partitions and Catchup
Goal: implement heal() and truncate_and_replay so a partitioned
follower can rejoin and converge. Then ship the cross-language exam.
What to read first
CONCEPTS.md§ "Partition + heal".docs/execution.mdStages 4–7.src/rust/src/lib.rs— thetruncate_and_replayandCluster::healbodies.
Concrete tasks
- Implement
Replica::truncate_and_replay(leader_log, leader_commit): replace own log, wipe state machine, replay committed prefix. - Implement
Cluster::heal():- clone leader log + commit index up front (avoid use-after-mutate),
- clear
partitions, - for each previously-partitioned follower in ascending id order,
call
truncate_and_replay.
- In
propose, whentry_appendreturnsfalse(gap), do a snapshot push immediately and count the ack. - Implement
dump_stateper the wire format inCONCEPTS.md. Pin every byte offset in a test (test_snapshot_byte_format). - Port the workload driver (
run_workload) to all three languages. The byte-decoding rules —kind = (r1 >> 62) & 0x3,k = "k" + …,v = u64_le(r3 % 10000)— must be identical across all three. - Build Rust binary, run scenarios A and B, capture the hashes,
bake them into Go test, C++ test, and
scripts/cross_test.sh. - Bring Go and C++ green: run
scripts/cross_test.sh. It must end with=== ALL OK ===.
Definition of done
bash scripts/verify.sh # → "=== OK ==="
bash scripts/cross_test.sh # → "=== ALL OK ==="
Both scenarios produce:
- A:
1febc1252f87f873c315526e9d9c78a622131d700dccca84a6e089244930252b - B:
272af5b41b729896a7195a6ea72d19111a96a50b29d5d4cdfaac03a058e1a2dc
Common bugs at this stage
heal()readsleader.logafter mutating a follower — use a snapshot variable.dump_statein Go iterates the map directly → randomised hash. Fix: sort the keys.dump_statein C++ usesstrcpy(magic_buf, "DSEDKV20")and copies 9 bytes including the NUL. Fix:std::memcpy(buf, MAGIC.data(), 8).- C++ test passes in Debug, fails in Release because
assert(c.propose(...))got stripped. Fix: assign to abool ok = ...first, thenassert(ok). - CLI prints a trailing newline. The exam compares full lines; a
trailing
\nbreaks the hash comparison.