Step 02 — Snapshot and Workload

Goal

Pin a wire format for CounterStore and a deterministic workload generator so that, given identical (seed, ops, keys), all three implementations produce the same bytes — and therefore the same SHA-256 digest.

What to build

dump_snapshot

A byte serializer with this exact layout:

"DSEBENCH"  (8 bytes, ASCII)
total_ops   (u64 little-endian)
distinct_keys (u32 little-endian)
for each key in ascending order:
    key (i64 little-endian)
    count (u64 little-endian)

Critical details:

  • Ascending iteration order. BTreeMap / std::map are already sorted; Go must call sort.Slice on the keys explicitly.
  • Little-endian for every integer.
  • No padding, no separators, no trailing bytes.

SplitMix64

Implement the standard one-state-word SplitMix64:

state += 0x9E3779B97F4A7C15
z = state
z = (z ^ (z >> 30)) * 0xBF58476D1CE4E7B5
z = (z ^ (z >> 27)) * 0x94D049BB133111EB
return z ^ (z >> 31)

Also implement the stateless splitmix64(x) (without the state += step) for the canonical test vector check.

run_workload(seed, ops, keys, scenario)

rng = SplitMix64(seed)
store = empty CounterStore
repeat ops times:
    r1 = rng.next()
    r2 = rng.next()
    r3 = rng.next()
    kind = (r1 >> 62) & 0x3       # 0,1,2 → incr, 3 → decr
    k    = i64(r2 % keys)
    by   = (r3 % 100) + 1
    if kind == 3 -> store.decr(k, by) else store.incr(k, by)
return store.dump_snapshot()

The scenario argument is reserved and ignored for now.

Tests this step should pass

  • sha256_vectors: empty and "abc" SHA-256 vectors.
  • splitmix64_known: splitmix64(0) == 0x8b57dafca0cee644.
  • snapshot_layout_two_keys: incr keys 2 and 1, snapshot is 52 bytes with magic, total_ops=2, distinct_keys=2, then the row for key 1 before the row for key 2.
  • workload_determinism: two runs of the same workload produce byte-identical snapshots.
  • scenario_a_frozen / scenario_b_frozen: hashes match the golden values in CONCEPTS.md.

Things to watch for

  • Always draw three RNG words per iteration, even if a branch only needs two. The RNG stream must be identical across languages.
  • Never iterate a hash map for serialization. Sort first.
  • Don't put size_t or usize on the wire — always serialize as u32 or u64.

Acceptance

scripts/cross_test.sh reports === ALL OK ===.