db-15 — Sqlite-shaped engine, end-to-end

Where this sits

This lab is the capstone for the SQLite-style track. Earlier labs (db-10 .. db-14) build the parts in isolation: B-tree (db-10), pager (db-11), SQL frontend (db-12), MVCC transactions (db-13), indexes (db-14). Here we fuse a deliberately small slice of all of them into one engine and prove the slice is reproducible across Rust, Go, and C++ down to the byte.

The goal is not feature parity with real SQLite — that would dwarf the lab. The goal is to exhibit, in code small enough to keep in your head, the join between:

a primary index keyed by integer,
a secondary index keyed by text,
an MVCC tombstone scheme governed by a monotonic transaction id,
a deterministic snapshot wire format that any of the three reference implementations can produce identically.

Data model

A single table:

kv(k INT primary key, v INT, tag TEXT)

Physical row:

Row { v: i64, tag: String, created_at: u64, deleted_at: u64 }

deleted_at == 0 means the row is live; anything else is the txid at which it was tombstoned. Tombstoned rows stay in the primary map (they appear in the snapshot dump so a verifier can audit historical state) but they disappear from the secondary index immediately on delete.

In-memory layout:

primary: ordered map<i64 -> Row> — sorted by key. Holds tombstones.
secondary: ordered map<String -> sorted Vec<i64>> — live rows only. Each list is kept strictly ascending.

A single monotonically-increasing next_txid (starts at 1) governs visibility. Read-only SELECT never bumps it. Write ops bump only when they actually mutate state.

SQL surface

Only four ops, deliberately:

op	semantics	txid bump?
`INSERT(k, v, tag)`	UPSERT — replaces an existing row (even a tombstoned one) with a fresh row whose `created_at` is the new txid.	always
`UPDATE(k, v, tag)`	live-only. If the row is missing or tombstoned, returns false and does not bump txid. Keeps original `created_at`. Maintains the secondary index across tag changes.	only if work was done
`DELETE(k)`	live-only. Marks `deleted_at = txid`, removes the row from the secondary index.	only if work was done
`SELECT_BY_K(k)` / `SELECT_BY_TAG(tag)`	read-only.	never

The semantic gotcha for cross-language identity is the no-op rule on UPDATE and DELETE. If any implementation bumps txid on a missing key, every subsequent created_at / deleted_at will drift and the snapshot diverges.

Snapshot wire format

Magic = "DSESQL15" (8 bytes, ASCII).

magic[8] || next_txid:u64 LE || primary_row_count:u32 LE
for each k in ascending order:
    k:i64 LE
    v:i64 LE
    tag_len:u32 LE
    tag_bytes
    created_at:u64 LE
    deleted_at:u64 LE
secondary_distinct_keys:u32 LE
for each tag in ascending lexicographic order:
    tag_len:u32 LE
    tag_bytes
    key_count:u32 LE
    for each key in ascending order: i64 LE

Three properties this format is built for:

Total order at every level. Both the primary and secondary sections iterate in a sort order that is well-defined regardless of the host hash map (a real bug we hit in early Go drafts: map iteration is randomised, so a for k, v := range without an explicit sort produces a different byte stream on every run).
Tombstones are observable. Including tombstones in the primary dump means the snapshot reflects the visibility scheme, not just the live set — useful when comparing two implementations' MVCC behaviour.
Self-delimiting. Every variable-length string is preceded by its length, so a parser does not have to guess.

Deterministic workload

run_workload(seed, ops, keys, scenario) is the only entry point used in cross-language testing. It draws three 64-bit words per op from a splitmix64 seeded with seed:

r1, r2, r3 = rng.next(), rng.next(), rng.next()
kind = (r1 >> 60) & 0x7
k    = (i64) (r2 % keys)
v    = (i64) (r3 % 10_000)
tag  = "t" + ((r3 >> 32) % 16)
match kind {
    0,1,2 => INSERT(k, v, tag)   // 3/8 of ops
    3,4   => UPDATE(k, v, tag)   // 2/8
    5     => DELETE(k)           // 1/8
    6     => SELECT_BY_K(k)      // 1/8
    7     => SELECT_BY_TAG(tag)  // 1/8
}

Two non-obvious rules:

Reads still draw all three rng words. Even though SELECT_BY_K only needs k, it still draws r3. Skipping the draw would shift the rng stream for every subsequent op and break determinism across scenarios.
tag = "t" + decimal(n). Decimal string formatting, not hex — trivially easy to get wrong in C++ where std::ostringstream << std::hex is the default reflex.

Frozen golden hashes

Captured from the Rust release build. The cross-language test asserts these byte for byte.

Scenario	CLI args	SHA-256
A	`--seed 42 --ops 500 --keys 32 --scenario default`	`e8ccacd39d8535c1ed101f0bc8b7a0799f56468a384da9284d4768cd8b3a3aab`
B	`--seed 7 --ops 2000 --keys 128 --scenario default`	`dd1d6bb7fec1ffc9f71f01e75a58166b04517a669495af2aa2da432d4722db69`

Sources of cross-language divergence

A non-exhaustive checklist that we hit while building the three ports:

Map iteration order. Go map iteration is randomised. Always collect keys then sort.Strings/sort.Slice before any side-effecting iteration that contributes to the wire stream.
Signed vs unsigned k. r2 % keys is unsigned modular arithmetic in all three languages; we then cast to i64. A cast through int on 32-bit platforms would lose bits. C++ uses static_cast<int64_t>, Rust as i64, Go int64(...).
Tag formatting. Use base-10 only. Padding, hex, or uppercase would all change the bytes silently.
Splitmix64 constants. All three implementations use the same triple: 0x9E3779B97F4A7C15, 0xBF58476D1CE4E7B5, 0x94D049BB133111EB. Forgetting the ULL suffix in C++ truncates the constants to 32 bits and produces a different stream.
SHA-256. Rust uses sha2, Go uses crypto/sha256, C++ ships an inline reference implementation in src/cpp/src/sql15.cc. A canonical test vector (SHA256("abc")) is asserted in every test suite to catch a broken implementation before it pollutes a scenario hash.
No trailing newline from the CLI. The shell-level test compares "$RUST_BIN ..." with "$GO_BIN ..." as raw strings; an extra \n from one of them silently fails the equality. Rust uses print!, Go uses fmt.Print, C++ uses std::cout << ... with no << endl.

What this lab does not model

Listed up front so the reader does not look for them:

No on-disk persistence, no WAL, no pager. The "snapshot" is an in-memory byte stream produced on demand.
No concurrent transactions. MVCC visibility rules are implemented, but there is only one writer.
No query planner; SELECT_BY_K and SELECT_BY_TAG are direct map lookups.
No DDL. The schema is hard-coded.

Those are deliberately deferred to db-21 and the capstone (db-23).

Distributed Systems Engineer — Build Databases & Consensus From Scratch