db-18 step 03 — Cross-language determinism

Goal

The Rust, Go, and C++ implementations must, given the same (seed, nodes, rounds, proposals, partition) CLI inputs, produce the byte-identical canonical dump and therefore the same SHA-256. This is the third leg of the lab: protocol correctness plus simulator determinism plus serialisation discipline.

Tasks

  1. Discrete-event simulator. A Cluster owns a min-heap of pending RPCs keyed (delivery_time, src, seq). seq is a monotonically increasing per-cluster counter assigned at send time, breaking ties when two RPCs from the same sender become deliverable on the same tick. Every send pushes onto the heap; every tick pops everything due, dispatches it via node.handle(rpc, src, t, &mut out), and pushes any reply RPCs back onto the heap with delivery = t + 1 + splitmix64(seed ^ src ^ dst ^ seq) % 3.

  2. Iteration discipline. All iteration over collections of nodes, slots, or peers must be in sorted order. Rust uses BTreeMap / BTreeSet exclusively. Go uses sort.Slice / sort.Ints before every loop over a map's keys. C++ uses std::map / std::set. A single iteration over a hash map anywhere in the protocol path will diverge across languages on ~2000 ticks.

  3. Partition modelling. The Cluster carries a Drop: Set<(u32, u32)> of dropped unidirectional edges. The CLI flag --partition s,d,s,d,... parses pairs and inserts them. Asymmetric partitions are intentional: --partition 0,1 only drops 0→1 traffic, not 1→0. Scenario F exercises this.

  4. Canonical dump. canonical_dump(&cluster) writes:

    "DSEPAX01"                     (8 bytes magic)
    u32_le(node_count)
    for each node in ascending id:
        u32_le(id)
        u32_le(promised_ballot.round)
        u32_le(promised_ballot.proposer_id)
        u8(role)                   (Follower=0, Candidate=1, Leader=2)
        u32_le(my_ballot.round)
        u32_le(my_ballot.proposer_id)
        u32_le(accepts_len)
        for each (slot, (ballot, value)) in accepts, ascending slot:
            u64_le(slot)
            u32_le(ballot.round)
            u32_le(ballot.proposer_id)
            u32_le(value_len)
            value bytes
        u32_le(learned_len)
        for each (slot, value) in learned, ascending slot:
            u64_le(slot)
            u32_le(value_len)
            value bytes
    

    Hash the bytes with SHA-256, print lowercase hex, no trailing newline.

  5. CLI: paxosctl. Each language ships a binary that accepts --seed <u64> --nodes <u32> --rounds <u32> --proposals <u32> [--partition s,d,...], runs the cluster for rounds ticks with proposals scheduled at tick = (i+1) * rounds / (proposals+1), value = b"val-" + itoa(i), dumps canonical bytes, prints the hex SHA-256.

  6. scripts/cross_test.sh. Builds all three binaries, runs the 6 scenarios A–F against each, compares the three hashes to the canonical table, and exits non-zero on mismatch. The script ends with === ALL OK === on success.

Acceptance

Inline unit tests in each language:

  • dump_deterministic_across_runs — two independent Cluster::new(42, 3) instances each run 1000 ticks with 5 proposals produce byte-identical dumps. Confirms intra-language determinism.
  • Scenario A --seed 42 --nodes 3 --rounds 1000 --proposals 50a35fdad1dd97c76a40a61b020c6181a56c4a40d4f723cb68fe70c2112aa9b63
  • Scenario B --seed 7 --nodes 5 --rounds 2000 --proposals 203cc6cae6cb7f9d2b7cb88088a0f22581ac4c41bd86bab1b3676dd0ba33fd7ead
  • Scenario C --seed 99 --nodes 3 --rounds 500 --proposals 0f28d025af748a790beded6167115c7094a7f939b45d439728e4d6b7e144c3be0
  • Scenario D --seed 1 --nodes 1 --rounds 200 --proposals 5e5e0248c7c4fa20991b90afdac828eab91a7414497461dadc2e1553040693139
  • Scenario E --seed 42 --nodes 3 --rounds 1000 --proposals 3 --partition 0,1,0,2,1,0,2,0674e62d809248ac99401054c195d29b0e2eed6ccc78ec45e96da8aaf69c36096
  • Scenario F --seed 3 --nodes 5 --rounds 1500 --proposals 10 --partition 0,17d80176abad54e533b2f4174e84f58432a000255fbb2ecbbb1dd915cb6bb6ab5

All six match across Rust, Go, and C++; bash scripts/cross_test.sh exits 0 with === ALL OK ===.

Discussion prompts

  • Sort discipline. Find the language-default hash map in your language. What is its iteration order? What is the cost of replacing it with the language's ordered map for the canonical dump path only versus everywhere?
  • SplitMix64. Why is splitmix64 a good fit for a deterministic simulator clock when something like rand::thread_rng() is not? Walk through the three constants — what are they and why?
  • Three languages. What classes of bug does the cross-language test catch that a single-language test cannot? (Hint: think signed-vs-unsigned overflow, default hash randomisation, iteration order, integer-promotion rules in comparisons.)