Verification — db-22

What "verified" means here

For a perf-and-bench lab, "verified" means three things at once:

  1. All three implementations pass their own unit tests (Rust 9, Go 9, C++ 9).
  2. All three implementations produce byte-identical snapshot hashes for both frozen scenarios.
  3. The frozen hashes match the golden values committed in source.

Anything less and the bench numbers are meaningless. You can't claim "Rust does X ops/sec on this workload" if it is not doing the same work as the Go and C++ versions.

How to verify

bash scripts/verify.sh
bash scripts/cross_test.sh

Each script exits non-zero on failure and prints either === OK === or === ALL OK === on success.

Expected last lines:

$ bash scripts/verify.sh
…
=== OK ===

$ bash scripts/cross_test.sh
…
=== ALL OK ===

What each unit test pins

TestPins
sha256_vectorsSHA-256 against known empty and "abc" vectors
splitmix64_knownsplitmix64(0) == 0x8b57dafca0cee644
incr_accumulatesincr adds to existing entries, creates new ones, bumps total_ops
decr_saturates_and_removesdecrement past zero removes the entry
decr_on_missing_is_visible_opdecr on a missing key bumps total_ops but does not create the entry
snapshot_layout_two_keysexact wire bytes of a 2-key snapshot
workload_determinismsame seed twice → same snapshot bytes
scenario_a_frozen / scenario_b_frozenfrozen golden hashes per scenario

The frozen-scenario tests are the highest-value tests in the lab. Any silent change to the wire format, the workload, or SplitMix64 breaks both of them with a clear "got X, want Y" message in the failing language's test output.

Manual sanity checks

# bytes of the smallest meaningful snapshot
./target/release/benchctl hash workload --seed 0 --ops 0 --keys 1 --scenario default
# expected: sha256 of MAGIC || 0_u64 || 0_u32 = the empty-store hash

# determinism
./target/release/benchctl hash workload --seed 42 --ops 500 --keys 32 --scenario default
./target/release/benchctl hash workload --seed 42 --ops 500 --keys 32 --scenario default
# should print the same hex twice

What is not verified by these tests

  • That bench reports the correct throughput. It is impossible to verify a wall-clock number from a test. The bench harness has a distinct= field as a structural sanity check, but the numeric throughput is left to the operator to inspect.
  • That the implementations are equally fast — we only check they are equally correct. The whole point of the lab is to make speed comparisons honest by first making correctness identical.
  • That the implementations would still match on a 32-bit or big-endian platform. The wire format pins little-endian; on a hypothetical big-endian build we'd need a byte-swap in put_u64_le etc.