Step 03 — Cross-Language Byte Identity
Goal
Make Rust, Go, and C++ produce byte-identical dump_plan(plan) ++ dump_result(rows) streams for the same workload, and verify it with
SHA-256.
Why
If three implementations of the same spec produce the same bytes on a randomly-seeded workload, they agree on every observable behaviour — plan choice, operator order, row emission order, value encoding. A single divergent byte is the difference between "we have a spec" and "we have three programs that happen to look similar".
What to do
-
Freeze the wire format in
CONCEPTS.mdsection 9. Plan tags0x01..0x05, op codes0x01..0x06, val tags0x01..0x02, result magic"DSEQR01"(7 bytes, no terminator). -
Implement
dump_plan/dump_resultin each language. Use only little-endian primitives — never platform-native byte order. C++:std::memcpyofto_le_bytes-equivalent expressions; never reinterpretint*. Go:binary.LittleEndian.PutUint32/PutUint64. Rust:to_le_bytes(). -
Implement
RunWorkloadidentically:SplitMix64(seed)with the canonical constants0x9E3779B97F4A7C15,0xBF58476D1CE4E7B5,0x94D049BB133111EB.- For each row
iin0..rows: drawr1, r2; insert(IntV(i), Text("n" + (r1 % 1000)), IntV(r2 % 100)). - After insertion, apply the scenario's indexes (none / col 2 / cols 2+1).
- For each op: draw
r3, r4, r5; derivekind = (r3 >> 60) & 3,col = r4 % 3. Build the query per kind (0/1 → EQ, 2 → range withop = ((r5>>1)&1) ? Lt : Gt, 3 → projection-only). - Plan, execute, append
dump_plan ++ dump_resultto the rolling output.
-
CLI:
qplan workload --seed S --ops N --rows R --scenario Xprintssha256_hexof the rolling output with no trailing newline. -
Compare:
scripts/cross_test.shruns both scenarios across all three binaries and asserts the three hashes match each other and the frozen golden hashes.
Acceptance
scripts/verify.shends with=== OK ===(unit tests pass in all three languages).scripts/cross_test.shends with=== ALL OK ===(cross-language bytes match; golden hashes match).- Anchor tests (test 11 in Go and C++) verify scenario A's SHA-256 at
unit-test time, so drift is caught even without running
cross_test.sh.
Common pitfalls
- Trailing newline from
println!/fmt.Println/std::cout << std::endlwill change the binary's stdout. Usewrite_all/Write/fwriteandflush. - Magic length. Writing
"DSEQR01\0"(8 bytes) instead of 7 makes every op-boundary off by one. The byte-walkthrough indocs/observation.mdis the canonical reference if in doubt. - Map iteration order in Go. Use sorted slices for any structure whose iteration order ends up in the wire bytes.
#[repr(u8)]missing on Rust enums. Without it,op as u8may not equal the constants 1..6.boolpacking. Some C++ standard-librarystd::vector<bool>paths are surprising; never put aboolin the wire format — promote tostd::uint8_t.- SHA-256 final byte ordering. The output is big-endian per word;
hex-encoding mistakes swap nibbles. The empty-string known answer
(
e3b0c442...) catches this immediately.