Execution — db-21 Wire Format and Workload

This document is the single source of truth for the canonical wire format and the deterministic workload. Anything ambiguous here is a bug; fix the doc, not the implementations.

1. SplitMix64 PRNG

state += 0x9E3779B97F4A7C15
z      = state
z      = (z XOR (z >> 30)) * 0xBF58476D1CE4E5B9
z      = (z XOR (z >> 27)) * 0x94D049BB133111EB
return   z XOR (z >> 31)

All multiplications are unsigned 64-bit, wrapping on overflow. The PRNG is seeded with the user-supplied 64-bit seed. Three draws happen per op, even when only one or two are used — keep them in order (r1, r2, r3).

2. Operation Selection

op = (r1 >> 62) & 0b11
opAction
0, 1Put(k = "k" + (r2 mod keys), v = u32_be(r3 as u32))
2Delete(k = "k" + (r2 mod keys))
3RangeTomb(start = "k" + a, end = "k" + (a + 1 + (r3 mod (keys-a)))) where a = r2 mod keys

In scenario ptonly, op 3 is rewritten to op 0 before the action runs. The three draws still happen.

The value bytes are the big-endian 32-bit representation of r3 truncated to 32 bits. (Big-endian because it produces visually distinct bytes across fixtures; the format is otherwise little-endian.)

3. Flush and Compact Cadence

  • Every 8 ops (i.e. when (op_idx + 1) % 8 == 0): flush all pending entries and tombstones into a new SST at the newest position.
  • Every 16 ops (i.e. when (op_idx + 1) % 16 == 0): run one compaction pass appropriate to the scenario (size-tiered, universal, or no-op).
  • No residue flush at end. If the loop ends with non-zero pending entries, they are discarded. This is intentional: it keeps the cross-language hashes stable regardless of ops mod 8.

4. Canonical Wire Format

All integers little-endian. lenpref(b) means u32 LE len(b) ‖ b.

"DSEADV21"               (8 bytes, ASCII, no terminator)
f64 LE ratio             (IEEE 754 bit pattern, not a string)
u32 LE sst_count
for each SST (newest first):
    lenpref(smallest_key)
    lenpref(largest_key)
    u32 LE entry_count
    for each entry:
        u8 kind                 (Put = 1, Delete = 2)
        lenpref(key)
        if kind == Put: lenpref(value)
    u32 LE range_tomb_count
    for each range tombstone:
        lenpref(start_key)
        lenpref(end_key)
    u64 LE bloom_bitmap

5. The Three Canonical Fixtures

Captured from the Rust reference and pinned in scripts/cross_test.sh:

Fixtureseedopskeysscenariosha256 of dump
A4220032tieredcompactfc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6
B750064universalcompact05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb
C9930016withrange4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4

If you change anything about the workload or the wire format, these hashes change. That's the contract: the hashes are intentional padlocks on behavioural drift.

6. lsmctl CLI

lsmctl workload --seed S --ops N --keys K --scenario {ptonly|withrange|tieredcompact|universalcompact}

Prints the lowercase hex sha256 of dump_state() followed by a newline. Exit code is 0 on success, 2 on argument errors. All three ports must agree on stdout byte-for-byte for the same arguments.

7. Reproducing the Hashes

cd db-21-storage-engine-advanced
./scripts/verify.sh     # all unit tests
./scripts/cross_test.sh # cross-language byte equivalence

Expected last line: === ALL OK ===.