Step 03 — Fault injection and catch-up
Goal
Add the failure-injection schedule, the catch_up operation, and the
top-level run_cluster workload driver — completing the lab.
Tasks
- Implement
Cluster::set_follower_up(fid, up)(assertfidis 1 or 2, never 0). - Implement
Cluster::catch_up(fid):- Snapshot the leader's
logandcommit_index. - While the follower's
log.len()is less than the leader's, appendleader_log[fol.log.len()]to the follower. - If the follower's
commit_indexis below the leader's, set it to the leader's andapply_committed.
- Snapshot the leader's
- Implement
step_op(rng, keys):- Draw
r1, r2, r3 = rng.next()(always three). kind = (r1 >> 62) & 0x3;0,1,2 → Put,3 → Del.k = i64(r2 % keys),v = i64(r3 % 1000).
- Draw
- Implement
run_cluster(seed, ops, keys, scenario):down_start = ops/2,down_end = (ops*3)/4,with_fault = (scenario == "fault").- For
i in 0..ops:- If
with_fault && i == down_start: set follower 1 down. - If
with_fault && i == down_end: set follower 1 up, thencatch_up(1). submit(step_op(rng, keys)).
- If
- After the loop: if
with_fault && !up[1], set follower 1 up andcatch_up(1). (Handlesops % 4 != 0.)
- Write a
dbctl hash workload --seed N --ops N --keys N --scenario <normal|fault>CLI that prints the SHA-256 hex ofrun_cluster(...).encode_snapshot()with no trailing newline. - Freeze the two scenario hashes as named constants and assert them
in two tests per language. Cross-check with
scripts/cross_test.sh.
Acceptance
verify.shends with=== OK ===.cross_test.shends with=== ALL OK ===.- The two frozen hashes
5976b45b9f40f440e8249da27fe4fe752e005f606efc3596bdb25ca4e4f99296(normal, seed=42 ops=200 keys=16)d67c36725af65242e985a308db5152af2a3e2525fab33d11ed6e826a252ff792(fault, seed=7 ops=2000 keys=128) match across Rust, Go, and C++.
Pitfalls
- Drawing fewer RNG words on the
Delbranch will silently desync hashes — always draw three. - The post-loop catch-up matters: if the run ends inside the down window, follower 1 still needs to converge.
catch_upmust clone the leader's log first; mutating both at once in Rust requires careful borrow handling.- The "ack on
up[fid]only" rule is essential: a down follower contributes zero acks regardless of its log length.