db-18 — Observation
Expected canonical hashes
Six configurations are pinned in scripts/cross_test.sh. The lab is
green iff all three binaries (Rust release, Go release, C++ Release)
emit exactly these strings on stdout (no trailing newline):
| Name | Flags | SHA-256 of canonical dump |
|---|---|---|
| A | --seed 42 --nodes 3 --rounds 1000 --proposals 5 | 0a35fdad1dd97c76a40a61b020c6181a56c4a40d4f723cb68fe70c2112aa9b63 |
| B | --seed 7 --nodes 5 --rounds 2000 --proposals 20 | 3cc6cae6cb7f9d2b7cb88088a0f22581ac4c41bd86bab1b3676dd0ba33fd7ead |
| C | --seed 99 --nodes 3 --rounds 500 --proposals 0 | f28d025af748a790beded6167115c7094a7f939b45d439728e4d6b7e144c3be0 |
| D | --seed 1 --nodes 1 --rounds 200 --proposals 5 | e5e0248c7c4fa20991b90afdac828eab91a7414497461dadc2e1553040693139 |
| E | --seed 42 --nodes 3 --rounds 1000 --proposals 3 --partition 0,1,0,2,1,0,2,0 | 674e62d809248ac99401054c195d29b0e2eed6ccc78ec45e96da8aaf69c36096 |
| F | --seed 3 --nodes 5 --rounds 1500 --proposals 10 --partition 0,1 | 7d80176abad54e533b2f4174e84f58432a000255fbb2ecbbb1dd915cb6bb6ab5 |
These are the contract. Edit any production code such that one of these strings changes and you have changed semantics; reverify end-to-end before you ship.
Walking the wire: scenario D byte-by-byte
Scenario D is the shortest possible dump (one node, five proposals,
all decided locally). Use it as a Rosetta Stone before debugging the
multi-node hashes. The layout is magic || u32 node_count || node[],
and the node payload starts at offset 12.
00..07 4453 4550 4158 3031 "DSEPAX01" magic
08..0b 01 00 00 00 node_count = 1
0c..0f 00 00 00 00 node.id = 0
10..13 rr rr 00 00 node.promised_ballot.round (round it won at)
14..17 00 00 00 00 node.promised_ballot.proposer_id (= self.id = 0)
18 02 node.role = Leader (2)
19..1c rr rr 00 00 node.my_ballot.round
1d..20 00 00 00 00 node.my_ballot.proposer_id
21..24 05 00 00 00 accept_count = 5
... 5 × {u64 slot, u32 ab.round, u32 ab.proposer_id, u32 value_len, value bytes}
... then u32 learned_count = 5 and 5 × {u64 slot, u32 value_len, value bytes}
Run:
src/rust/target/release/paxosctl --seed 1 --nodes 1 --rounds 200 --proposals 5
# e5e0248c7c4fa20991b90afdac828eab91a7414497461dadc2e1553040693139
To dump the raw bytes (skip the sha256 step) hack the binary to print
canonical_dump instead of sha256_hex(&canonical_dump); do it
locally only — the canonical CLI output is the sha256.
Walking the wire: scenario C (no proposals)
Scenario C runs three nodes for 500 ticks with --proposals 0.
Exactly one of them will be elected leader; nobody decides anything.
The dump therefore has accept_count == 0 and learned_count == 0
for every node. The bytes that do change between languages if you
have an iteration-order bug are the per-node promised_ballot.round
values (the elected leader's round depends on whether some other
proposer almost-elected first). If C is the failing scenario, you
have an election-timer determinism bug, not a Phase-2 bug.
Divergence runbook
If cross_test.sh prints MISMATCH scenario X, follow this script:
# 1. Capture the raw bytes from each binary. Patch paxosctl locally
# to print `canonical_dump` raw instead of sha256 hex, run once,
# then revert the patch. Save to rust.bin, go.bin, cpp.bin.
cmp -l rust.bin go.bin | head
cmp -l rust.bin cpp.bin | head
cmp -l prints byte_offset rust_value go_value in octal. Map the
first offset to the field it belongs in:
| Offset | Field | Likely culprit |
|---|---|---|
| 0..7 | magic "DSEPAX01" | wrong magic literal |
| 8..11 | node_count | wrong u32_le writer, wrong endianness |
| 12 + k*node_size + 0..3 | node.id | iterating nodes in wrong order (not ascending id) |
| 12 + k*node_size + 4..11 | promised_ballot | election-timer drift or wrong PRNG seed mix |
| 12 + k*node_size + 12 | role (1 byte) | enum reordered (must be Follower=0, Candidate=1, Leader=2) |
| 12 + k*node_size + 13..20 | my_ballot | step-down logic differs (e.g., resetting my_ballot to zero or not) |
| 12 + k*node_size + 21..24 | accept_count | one acceptor accepted a slot the others did not — Phase-2 message ordering bug |
| inside an accept tuple | slot | accepts iterated in receive order, not sorted by slot |
| inside an accept tuple | accepted_ballot | Phase-1 recovery used a wrong rule (e.g., last-write-wins instead of highest-ballot) |
| inside an accept tuple | value_len / value | wrong proposal scheduled at this slot — proposal-injection rule or leader-pick rule differs |
| inside the learned section | slot / value | the difference is downstream of an accept-section difference; fix that first |
Tick-level diff
If cmp -l flags a divergence inside the accepts of node 1, add
eprintln!/fmt.Fprintln(os.Stderr, ...)/std::cerr lines in each
implementation at the boundaries of the suspect ticks:
#![allow(unused)] fn main() { // after handle() and after on_tick(): eprintln!("t={} id={} promised={:?} role={:?} my={:?} accepts={:?} learned={:?}", t, id, n.promised_ballot, n.role, n.my_ballot, n.accepts, n.learned); }
Run all three, diff -u rust.log go.log. The first differing tick is
the bug.
Most common culprits in practice
- Forgetting to sort the Promise payload by slot. Go's
mapiteration order is randomized; you mustsort.Slicebefore appending to the wire. - Reading
next_slotbefore recovering fromprepare_accepted. If recovery doesn't updatenext_slot = max + 1, the leader will double-allocate a slot that already has a recovered accept, silently overwriting it. - Letting
step_downclearpromised_ballot. Promises are forever; onlymy_ballotis candidate-state. - Counting yourself twice in
accept_count. Bothbecome_leaderandtry_decideinsert self; the second one is a no-op only ifaccept_countis a set, not a multiset. - Iterating peers as
for p in nodes.iter()on aHashMap. UseBTreeMapin Rust,std::mapin C++, and explicitfor p := uint32(0); p < n; p++in Go.