step 03 — cross-language snapshot

Goal

Produce the canonical snapshot byte stream defined in ../CONCEPTS.md, run the deterministic workload in each language, and assert byte-identical SHA-256 across Rust, Go, and C++.

By the end of this step:

  • dump_snapshot exists in every language and produces bytes that match the spec section-for-section.
  • A run_workload(seed, ops, keys, scenario) function exists in every language and is bit-exact.
  • The CLI prints the hex SHA-256 with no trailing newline.
  • scripts/verify.sh ends with === OK ===.
  • scripts/cross_test.sh ends with === ALL OK === and reports both golden hashes for scenarios A and B.

Tasks

  1. Implement dump_snapshot. Build it incrementally: write the magic + header first, get a single-row dump matching by hand, then add the secondary section.
  2. Implement splitmix64 and a stateful SplitMix64::next(). Pin a test for splitmix64(0) == 0x8b57dafca0cee644 to guard against constant typos.
  3. Implement run_workload per the rules in CONCEPTS.md. Pay special attention to: drawing all three rng words even for read ops; the kind decoding (r1 >> 60) & 0x7; the modulo casts to i64.
  4. Implement sha256_hex. In Rust use the sha2 crate. In Go use crypto/sha256 + encoding/hex. In C++ inline the reference implementation (FIPS 180-4) — keep it in the same translation unit as the engine to avoid a third-party dependency. Pin SHA256("") and SHA256("abc") in tests.
  5. Wire up the CLI: sqlitectl workload --seed N --ops N --keys N --scenario S. Print the hex with print! / fmt.Print / std::cout — no newline.
  6. Run scripts/verify.sh then scripts/cross_test.sh. Iterate until both end with their success markers.

Debugging a divergence

If cross_test.sh shows different hashes between languages, follow the ladder in ../docs/verification.md: shrink the op count, dump the raw snapshot bytes with xxd, diff, and look for the first differing byte. It almost always points at a section boundary that exposes either map-iteration order or a wrong-width cast.

Acceptance

  • All three unit suites pass under release optimisation.
  • Both === OK === and === ALL OK === markers appear.
  • Scenario A hash: e8ccacd39d8535c1ed101f0bc8b7a0799f56468a384da9284d4768cd8b3a3aab.
  • Scenario B hash: dd1d6bb7fec1ffc9f71f01e75a58166b04517a669495af2aa2da432d4722db69.