Step 03 — Cross-Language Byte Identity

Goal

Make Rust, Go, and C++ produce byte-identical dump_plan(plan) ++ dump_result(rows) streams for the same workload, and verify it with SHA-256.

Why

If three implementations of the same spec produce the same bytes on a randomly-seeded workload, they agree on every observable behaviour — plan choice, operator order, row emission order, value encoding. A single divergent byte is the difference between "we have a spec" and "we have three programs that happen to look similar".

What to do

  1. Freeze the wire format in CONCEPTS.md section 9. Plan tags 0x01..0x05, op codes 0x01..0x06, val tags 0x01..0x02, result magic "DSEQR01" (7 bytes, no terminator).

  2. Implement dump_plan / dump_result in each language. Use only little-endian primitives — never platform-native byte order. C++: std::memcpy of to_le_bytes-equivalent expressions; never reinterpret int*. Go: binary.LittleEndian.PutUint32 / PutUint64. Rust: to_le_bytes().

  3. Implement RunWorkload identically:

    • SplitMix64(seed) with the canonical constants 0x9E3779B97F4A7C15, 0xBF58476D1CE4E7B5, 0x94D049BB133111EB.
    • For each row i in 0..rows: draw r1, r2; insert (IntV(i), Text("n" + (r1 % 1000)), IntV(r2 % 100)).
    • After insertion, apply the scenario's indexes (none / col 2 / cols 2+1).
    • For each op: draw r3, r4, r5; derive kind = (r3 >> 60) & 3, col = r4 % 3. Build the query per kind (0/1 → EQ, 2 → range with op = ((r5>>1)&1) ? Lt : Gt, 3 → projection-only).
    • Plan, execute, append dump_plan ++ dump_result to the rolling output.
  4. CLI: qplan workload --seed S --ops N --rows R --scenario X prints sha256_hex of the rolling output with no trailing newline.

  5. Compare: scripts/cross_test.sh runs both scenarios across all three binaries and asserts the three hashes match each other and the frozen golden hashes.

Acceptance

  • scripts/verify.sh ends with === OK === (unit tests pass in all three languages).
  • scripts/cross_test.sh ends with === ALL OK === (cross-language bytes match; golden hashes match).
  • Anchor tests (test 11 in Go and C++) verify scenario A's SHA-256 at unit-test time, so drift is caught even without running cross_test.sh.

Common pitfalls

  • Trailing newline from println! / fmt.Println / std::cout << std::endl will change the binary's stdout. Use write_all / Write / fwrite and flush.
  • Magic length. Writing "DSEQR01\0" (8 bytes) instead of 7 makes every op-boundary off by one. The byte-walkthrough in docs/observation.md is the canonical reference if in doubt.
  • Map iteration order in Go. Use sorted slices for any structure whose iteration order ends up in the wire bytes.
  • #[repr(u8)] missing on Rust enums. Without it, op as u8 may not equal the constants 1..6.
  • bool packing. Some C++ standard-library std::vector<bool> paths are surprising; never put a bool in the wire format — promote to std::uint8_t.
  • SHA-256 final byte ordering. The output is big-endian per word; hex-encoding mistakes swap nibbles. The empty-string known answer (e3b0c442...) catches this immediately.