db-19 — Observation
What the cross-language test produces and how to read it by hand.
Expected sha256s
scripts/cross_test.sh runs six scenarios and asserts the three
binaries (Rust, Go, C++) all print the same hex digest. The current
canonical hashes are:
| label | args | sha256 |
|---|---|---|
| A | --seed 42 --nodes 3 --rounds 1000 --proposals 5 | 16af5aa6dbd5ce09b259755f3339d6cf23966ce115b0e30d9c2990487783047d |
| B | --seed 7 --nodes 5 --rounds 2000 --proposals 20 | b60388e978a9b98792edb00c8d33217da8bff9945a89d2c0c18b5f69520b91cf |
| C | --seed 99 --nodes 3 --rounds 500 --proposals 0 | 8aef7604639fe0f2b349b38d74e10b6da8ac252b626976563bba69c722426296 |
| D | --seed 1 --nodes 1 --rounds 200 --proposals 5 | d4dbb92f91f9a0adf0c4c0b91fa46b2a5145907450897cd6473a02a6279604fd |
| E | --seed 42 --nodes 3 --rounds 1000 --proposals 3 --partition 0,1,0,2,1,0,2,0 | 5e4dbddb605e469c99fb682c00256445dcb2ed07e984f673d4296ef19719979a |
| F | --seed 3 --nodes 5 --rounds 1500 --proposals 10 --partition 0,1 | c9df583bd714534c488aac710e6cc6e57e4b21d2fe96ec17068bd1c7525bc1b3 |
If any of these change, cross_test.sh will fail. Either you have a
bug, or you have intentionally changed the spec (timer constants,
schedule formula, dump layout) and you must update this table in the
same commit.
What the canonical dump looks like (scenario D — single node)
--seed 1 --nodes 1 --rounds 200 --proposals 5. Five proposals into
a single-node cluster — the leader is itself the majority, so every
proposal commits immediately and discovery/sync collapse to a no-op
(quorum reached on self.id).
offset 0x00 : 44 53 45 5A 41 42 30 31 "DSEZAB01" magic
offset 0x08 : 01 00 00 00 1 node_count
offset 0x0c : 00 00 00 00 0 node id
offset 0x10 : 02 role = Leading (2)
offset 0x11 : XX XX XX XX current_epoch (== 1 if no churn)
offset 0x15 : XX XX XX XX accepted_epoch (== current_epoch)
offset 0x19 : XX XX XX XX last_zxid.epoch (== current_epoch)
offset 0x1d : 05 00 00 00 last_zxid.counter = 5
offset 0x21 : XX XX XX XX last_committed.epoch
offset 0x25 : 05 00 00 00 last_committed.counter = 5
offset 0x29 : 05 00 00 00 history_len = 5
offset 0x2d : XX XX XX XX history[0].zxid.epoch
offset 0x31 : 01 00 00 00 history[0].zxid.counter
offset 0x35 : 05 00 00 00 history[0].payload_len = 5
offset 0x39 : 7A 61 62 2D 30 "zab-0" payload
...
Each subsequent history entry is 4 + 4 + 4 + 5 = 17 bytes (epoch +
counter + len + "zab-N"). Total dump for D is therefore
0x2d + 5 * 17 = 0x86 = 134 bytes. Exact bytes depend on whatever
epoch the leader has bumped through by the time the run ends; the
single-node case is nearly always current_epoch = 1.
A multi-node dump (scenario C — quiet cluster)
--seed 99 --nodes 3 --rounds 500 --proposals 0. No proposals; the
cluster elects a leader, runs through discovery + sync, then
heartbeats for the rest of the run. Every node's history is empty:
44 53 45 5A 41 42 30 31 magic
03 00 00 00 node_count = 3
00 00 00 00 node id 0
XX role (Following or Leading)
XX XX XX XX current_epoch (1 if first election succeeded clean)
XX XX XX XX accepted_epoch
00 00 00 00 00 00 00 00 last_zxid (0, 0)
00 00 00 00 00 00 00 00 last_committed (0, 0)
00 00 00 00 history_len = 0
01 00 00 00 node id 1
... same shape ...
02 00 00 00 node id 2
... same shape ...
Total dump: 8 + 4 + 3 * (4 + 1 + 4 + 4 + 4 + 4 + 4 + 4 + 4) = 105 bytes. (33 bytes per node with empty history.)
How to debug a divergence
If cross_test.sh fails, write the raw dumps to disk (the CLI prints
only the hash; you'll need a one-liner that calls canonical_dump
directly, or modify zabctl.rs / main.go / zabctl.cc to dump the
raw bytes instead of the hash). Then:
cmp -l /tmp/zab_A_rust.bin /tmp/zab_A_go.bin | head
xxd /tmp/zab_A_rust.bin | sed -n '<line>,+2p'
xxd /tmp/zab_A_go.bin | sed -n '<line>,+2p'
The first divergence offset tells you what to look at:
| offset range | likely culprit |
|---|---|
| 0x00–0x07 | magic (typo: DSEZAB01 not DSEZAB1 or DSEZAB02) |
| 0x08–0x0b | node_count (impossible if all three accept --nodes correctly) |
inside a node block, on role | enum mapping (Looking=0, Following=1, Leading=2) |
inside a node block, on current_epoch / accepted_epoch | discovery handshake bug; the leader's pending_new_epoch likely didn't max() against current_epoch |
inside a node block, on last_zxid | counter reset on epoch change wrong (must reset to 0; first new proposal has counter 1) |
inside a node block, on last_committed | try_commit quorum count wrong, or propose() not calling try_commit (n=1 case) |
inside history_len | follower Propose filter wrong (out-of-order zxid not dropped), or NewLeader replacement not adopting leader's history |
| inside a history entry | broadcast loop iteration order — must be ascending peer id |
In all six existing scenarios these checks pass; the table above is the runbook for the day someone changes the algorithm and forgets to update one of the three implementations.
Tick-level scope (Rust REPL trick)
To watch a scenario from the inside, add this temporary print in
Cluster::run before the per-tick loop body:
#![allow(unused)] fn main() { if std::env::var("ZAB_TRACE").is_ok() { eprintln!( "t={} roles={:?} epochs={:?} commits={:?}", t, self.nodes.iter().map(|n| n.role).collect::<Vec<_>>(), self.nodes.iter().map(|n| n.current_epoch).collect::<Vec<_>>(), self.nodes.iter().map(|n| n.last_committed.counter).collect::<Vec<_>>(), ); } }
then run ZAB_TRACE=1 zabctl --seed 42 --nodes 3 --rounds 1000 --proposals 5 | head -50. The trace goes to stderr; the canonical
dump's sha256 still goes to stdout unchanged. Remove before commit.
Reading the hashes themselves
The hashes are arbitrary — they are SHA-256 of a binary blob whose
bytes encode every node's state at the end of the run. There is no
way to look at 16af5aa6... and infer anything about the cluster.
What matters is that the same input produces the same output in three
languages and that the table above doesn't drift unintentionally.
For human-readable insight, dump canonical_dump(&c) to a file and
run xxd over it, or print individual node states in a test rather
than at the CLI surface.