Execution — db-21 Wire Format and Workload
This document is the single source of truth for the canonical wire format and the deterministic workload. Anything ambiguous here is a bug; fix the doc, not the implementations.
1. SplitMix64 PRNG
state += 0x9E3779B97F4A7C15
z = state
z = (z XOR (z >> 30)) * 0xBF58476D1CE4E5B9
z = (z XOR (z >> 27)) * 0x94D049BB133111EB
return z XOR (z >> 31)
All multiplications are unsigned 64-bit, wrapping on overflow. The PRNG is
seeded with the user-supplied 64-bit seed. Three draws happen per op,
even when only one or two are used — keep them in order (r1, r2, r3).
2. Operation Selection
op = (r1 >> 62) & 0b11
op | Action |
|---|---|
| 0, 1 | Put(k = "k" + (r2 mod keys), v = u32_be(r3 as u32)) |
| 2 | Delete(k = "k" + (r2 mod keys)) |
| 3 | RangeTomb(start = "k" + a, end = "k" + (a + 1 + (r3 mod (keys-a)))) where a = r2 mod keys |
In scenario ptonly, op 3 is rewritten to op 0 before the action runs.
The three draws still happen.
The value bytes are the big-endian 32-bit representation of r3 truncated
to 32 bits. (Big-endian because it produces visually distinct bytes across
fixtures; the format is otherwise little-endian.)
3. Flush and Compact Cadence
- Every 8 ops (i.e. when
(op_idx + 1) % 8 == 0): flush all pending entries and tombstones into a new SST at the newest position. - Every 16 ops (i.e. when
(op_idx + 1) % 16 == 0): run one compaction pass appropriate to the scenario (size-tiered, universal, or no-op). - No residue flush at end. If the loop ends with non-zero pending
entries, they are discarded. This is intentional: it keeps the
cross-language hashes stable regardless of
ops mod 8.
4. Canonical Wire Format
All integers little-endian. lenpref(b) means u32 LE len(b) ‖ b.
"DSEADV21" (8 bytes, ASCII, no terminator)
f64 LE ratio (IEEE 754 bit pattern, not a string)
u32 LE sst_count
for each SST (newest first):
lenpref(smallest_key)
lenpref(largest_key)
u32 LE entry_count
for each entry:
u8 kind (Put = 1, Delete = 2)
lenpref(key)
if kind == Put: lenpref(value)
u32 LE range_tomb_count
for each range tombstone:
lenpref(start_key)
lenpref(end_key)
u64 LE bloom_bitmap
5. The Three Canonical Fixtures
Captured from the Rust reference and pinned in scripts/cross_test.sh:
| Fixture | seed | ops | keys | scenario | sha256 of dump |
|---|---|---|---|---|---|
| A | 42 | 200 | 32 | tieredcompact | fc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6 |
| B | 7 | 500 | 64 | universalcompact | 05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb |
| C | 99 | 300 | 16 | withrange | 4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4 |
If you change anything about the workload or the wire format, these hashes change. That's the contract: the hashes are intentional padlocks on behavioural drift.
6. lsmctl CLI
lsmctl workload --seed S --ops N --keys K --scenario {ptonly|withrange|tieredcompact|universalcompact}
Prints the lowercase hex sha256 of dump_state() followed by a newline.
Exit code is 0 on success, 2 on argument errors. All three ports must
agree on stdout byte-for-byte for the same arguments.
7. Reproducing the Hashes
cd db-21-storage-engine-advanced
./scripts/verify.sh # all unit tests
./scripts/cross_test.sh # cross-language byte equivalence
Expected last line: === ALL OK ===.