db-11 step 03 — Cross-language byte agreement

Goal

Pin the file format. After this step a workload run in Rust, Go, or C++ produces sha256-identical files for the same inputs. This is what makes the pager a real cross-language contract, not three loosely-related implementations.

Tasks

  1. Implement SplitMix64 exactly:

    next(state):
        state += 0x9E3779B97F4A7C15            // wrapping
        z = state
        z = (z ^ (z >> 30)) * 0xBF58476D1CE4E7B5
        z = (z ^ (z >> 27)) * 0x94D049BB133111EB
        return z ^ (z >> 31)
    

    All multiplies are wrapping u64. Test against a known first-output table (seed = 0 yields 0xE220A8397B1DCDAF etc.).

  2. Implement run_workload(path, page_size, capacity, pages, ops, seed, scenario):

    pager = Pager::open(path, page_size, capacity)
    while pager.num_pages() < pages + 1:
        pager.allocate()
    rng = SplitMix64(seed)
    for _ in 0..ops:
        r = rng.next()
        op       = (r >> 62) & 0b11           // 0,1,2,3
        byte_val = (r >> 24) & 0xFF
        pid = match scenario:
            sequential -> 1 + (iteration % pages)
            random     -> 1 + (r as u64 % pages)
            mixed      -> if (r >> 60) & 1 then random_pid else sequential_pid
        match op:
            0 | 1 -> write a page of [byte_val; page_size]
            2     -> read pid (discard result)
            3     -> skip
    pager.flush()
    return pager
    

    Critical: each iteration consumes exactly one next() call. This is what keeps the three scenarios comparable for a given seed.

  3. Build a pagerctl CLI in each language with subcommands init and workload. workload runs the function above and prints sha256_file(path) in lowercase hex with no trailing newline to stdout. The CLI must accept <path> either before or after the --flags — the cross-test passes path first; some contributors will pass it last.

  4. Write scripts/cross_test.sh:

    • build all three binaries (cargo release, go build, cmake+make).
    • for scenarios A (sequential), B (random), C (mixed): run each language, sha256 the resulting file, assert all three match each other and match the baked-in expected hash.
    • spot-check the first 20 bytes of one file equal the expected header bytes.
  5. Bake the canonical hashes into the Go and C++ test suites too, so a divergence is caught at go test / ctest time even without running the shell script.

Acceptance

  • scripts/verify.sh exits 0; each language reports its tests green.
  • scripts/cross_test.sh exits 0 with === ALL OK ===.
  • The canonical hashes table in docs/verification.md matches the hashes hard-coded in:
    • scripts/cross_test.sh
    • src/go/pager_test.go::TestWorkloadMatchesCanonicalHashes
    • src/cpp/tests/test_pager11.cc (canonical hashes block)

Discussion prompts

  • What happens to the sha256 of Scenario A if you swap the order of the two multiplies in SplitMix64?
  • Why does the workload draw exactly one next() per iteration, even for the skip case? (See docs/analysis.md.)
  • If we wanted to add a fourth scenario (e.g. "read-heavy"), what would have to change in this lab to keep the cross-test working?