db-11 step 03 — Cross-language byte agreement
Goal
Pin the file format. After this step a workload run in Rust, Go, or C++ produces sha256-identical files for the same inputs. This is what makes the pager a real cross-language contract, not three loosely-related implementations.
Tasks
-
Implement
SplitMix64exactly:next(state): state += 0x9E3779B97F4A7C15 // wrapping z = state z = (z ^ (z >> 30)) * 0xBF58476D1CE4E7B5 z = (z ^ (z >> 27)) * 0x94D049BB133111EB return z ^ (z >> 31)All multiplies are wrapping u64. Test against a known first-output table (
seed = 0yields0xE220A8397B1DCDAFetc.). -
Implement
run_workload(path, page_size, capacity, pages, ops, seed, scenario):pager = Pager::open(path, page_size, capacity) while pager.num_pages() < pages + 1: pager.allocate() rng = SplitMix64(seed) for _ in 0..ops: r = rng.next() op = (r >> 62) & 0b11 // 0,1,2,3 byte_val = (r >> 24) & 0xFF pid = match scenario: sequential -> 1 + (iteration % pages) random -> 1 + (r as u64 % pages) mixed -> if (r >> 60) & 1 then random_pid else sequential_pid match op: 0 | 1 -> write a page of [byte_val; page_size] 2 -> read pid (discard result) 3 -> skip pager.flush() return pagerCritical: each iteration consumes exactly one
next()call. This is what keeps the three scenarios comparable for a given seed. -
Build a
pagerctlCLI in each language with subcommandsinitandworkload.workloadruns the function above and printssha256_file(path)in lowercase hex with no trailing newline to stdout. The CLI must accept<path>either before or after the--flags— the cross-test passes path first; some contributors will pass it last. -
Write
scripts/cross_test.sh:- build all three binaries (cargo release, go build, cmake+make).
- for scenarios A (sequential), B (random), C (mixed): run each language, sha256 the resulting file, assert all three match each other and match the baked-in expected hash.
- spot-check the first 20 bytes of one file equal the expected header bytes.
-
Bake the canonical hashes into the Go and C++ test suites too, so a divergence is caught at
go test/ctesttime even without running the shell script.
Acceptance
scripts/verify.shexits 0; each language reports its tests green.scripts/cross_test.shexits 0 with=== ALL OK ===.- The canonical hashes table in
docs/verification.mdmatches the hashes hard-coded in:scripts/cross_test.shsrc/go/pager_test.go::TestWorkloadMatchesCanonicalHashessrc/cpp/tests/test_pager11.cc(canonical hashes block)
Discussion prompts
- What happens to the sha256 of Scenario A if you swap the order
of the two multiplies in
SplitMix64? - Why does the workload draw exactly one
next()per iteration, even for theskipcase? (Seedocs/analysis.md.) - If we wanted to add a fourth scenario (e.g. "read-heavy"), what would have to change in this lab to keep the cross-test working?