Observation

Frozen golden hashes

Both scenarios produce SHA-256 hashes that are byte-identical across the Rust, Go, and C++ implementations. These are burned into scripts/cross_test.sh and into the Rust/Go/C++ test suites.

IDArgssha256
A--seed 42 --ops 200 --rows 500 --scenario mixed3918bc6eca225f1c9c004fdcefa6551788282a4a2223fa98b002e8b54eb74a2e
B--seed 7 --ops 500 --rows 2000 --scenario indexheavy9313fe694db38912a814abc16600d82f82ead7fc053e813af4bb3978c8fa9abd

If either hash changes, the wire format has drifted — CONCEPTS.md section 9 and all three test suites must be updated in lockstep.

Byte walkthrough — first op of scenario A

Scenario A drives 200 ops against a 500-row table with an index on column 2 (age). The first op uses (r3, r4, r5) from SplitMix64(42), gives kind = (r3 >> 60) & 3 = 0 ⇒ EQ on col = r4 % 3 of value pick_val_for(col, r5, 500).

Concretely the first op produces a Plan of:

Pipeline                       0x05 0x01 0x00 0x00 0x00     // 1 child
  IndexScan col=2 op=Eq Int(v) 0x02 0x02 0x00 0x00 0x00     // col_idx=2
                               0x01                          // Op::Eq
                               0x01                          // VTAG_INT
                               <i64 LE v>                    // 8 bytes

— 19 bytes total for the plan dump. The result row stream is:

"DSEQR01"                  0x44 0x53 0x45 0x51 0x52 0x30 0x31
u32 LE row_count           ....
per row: u32 col_count=3 | (tag|val) * 3

The 7-byte magic is deliberate — the length is part of the byte-identity contract.

Per-scenario telemetry

scenario A — mixed, 200 ops, 500 rows, 1 index (col=age)
  plan-kind distribution (theoretical, from (r3>>60)&3):
    EQ      ~50 %
    EQ alt  ~25 %     (kinds 0 and 1 both map to EQ)
    range   ~25 %
    project ~25 %
  → planner emits IndexScan when col == 2 and op != Ne (≈ 50 % of ops);
    otherwise SeqScan + Filter.

scenario B — indexheavy, 500 ops, 2000 rows, 2 indexes (col 1 and col 2)
  index coverage rises from ~33 % of predicates → ~66 %; the planner picks
  IndexScan on the smaller-bucket column ~3 × more often, dropping the
  total emitted-row count by an order of magnitude versus scanonly.

Unit test counts

rust  10 tests   cargo test --release --quiet
go    11 tests   go test ./...
cpp   11 tests   ctest --output-on-failure

All three suites cover the same eight planner behaviours plus dump determinism and SHA-256 known-answer vectors. Test 11 in Go and C++ anchors scenario A's hash directly so any drift fails at go test / ctest time, not just at cross_test.sh time.