db-15 — Sqlite-shaped engine, end-to-end
Where this sits
This lab is the capstone for the SQLite-style track. Earlier labs (db-10 .. db-14) build the parts in isolation: B-tree (db-10), pager (db-11), SQL frontend (db-12), MVCC transactions (db-13), indexes (db-14). Here we fuse a deliberately small slice of all of them into one engine and prove the slice is reproducible across Rust, Go, and C++ down to the byte.
The goal is not feature parity with real SQLite — that would dwarf the lab. The goal is to exhibit, in code small enough to keep in your head, the join between:
- a primary index keyed by integer,
- a secondary index keyed by text,
- an MVCC tombstone scheme governed by a monotonic transaction id,
- a deterministic snapshot wire format that any of the three reference implementations can produce identically.
Data model
A single table:
kv(k INT primary key, v INT, tag TEXT)
Physical row:
Row { v: i64, tag: String, created_at: u64, deleted_at: u64 }
deleted_at == 0 means the row is live; anything else is the txid at
which it was tombstoned. Tombstoned rows stay in the primary map (they
appear in the snapshot dump so a verifier can audit historical state)
but they disappear from the secondary index immediately on delete.
In-memory layout:
primary: ordered map<i64 -> Row>— sorted by key. Holds tombstones.secondary: ordered map<String -> sorted Vec<i64>>— live rows only. Each list is kept strictly ascending.
A single monotonically-increasing next_txid (starts at 1) governs
visibility. Read-only SELECT never bumps it. Write ops bump only
when they actually mutate state.
SQL surface
Only four ops, deliberately:
| op | semantics | txid bump? |
|---|---|---|
INSERT(k, v, tag) | UPSERT — replaces an existing row (even a tombstoned one) with a fresh row whose created_at is the new txid. | always |
UPDATE(k, v, tag) | live-only. If the row is missing or tombstoned, returns false and does not bump txid. Keeps original created_at. Maintains the secondary index across tag changes. | only if work was done |
DELETE(k) | live-only. Marks deleted_at = txid, removes the row from the secondary index. | only if work was done |
SELECT_BY_K(k) / SELECT_BY_TAG(tag) | read-only. | never |
The semantic gotcha for cross-language identity is the no-op rule on
UPDATE and DELETE. If any implementation bumps txid on a missing
key, every subsequent created_at / deleted_at will drift and the
snapshot diverges.
Snapshot wire format
Magic = "DSESQL15" (8 bytes, ASCII).
magic[8] || next_txid:u64 LE || primary_row_count:u32 LE
for each k in ascending order:
k:i64 LE
v:i64 LE
tag_len:u32 LE
tag_bytes
created_at:u64 LE
deleted_at:u64 LE
secondary_distinct_keys:u32 LE
for each tag in ascending lexicographic order:
tag_len:u32 LE
tag_bytes
key_count:u32 LE
for each key in ascending order: i64 LE
Three properties this format is built for:
- Total order at every level. Both the primary and secondary
sections iterate in a sort order that is well-defined regardless of
the host hash map (a real bug we hit in early Go drafts:
mapiteration is randomised, so afor k, v := rangewithout an explicit sort produces a different byte stream on every run). - Tombstones are observable. Including tombstones in the primary dump means the snapshot reflects the visibility scheme, not just the live set — useful when comparing two implementations' MVCC behaviour.
- Self-delimiting. Every variable-length string is preceded by its length, so a parser does not have to guess.
Deterministic workload
run_workload(seed, ops, keys, scenario) is the only entry point used
in cross-language testing. It draws three 64-bit words per op from a
splitmix64 seeded with seed:
r1, r2, r3 = rng.next(), rng.next(), rng.next()
kind = (r1 >> 60) & 0x7
k = (i64) (r2 % keys)
v = (i64) (r3 % 10_000)
tag = "t" + ((r3 >> 32) % 16)
match kind {
0,1,2 => INSERT(k, v, tag) // 3/8 of ops
3,4 => UPDATE(k, v, tag) // 2/8
5 => DELETE(k) // 1/8
6 => SELECT_BY_K(k) // 1/8
7 => SELECT_BY_TAG(tag) // 1/8
}
Two non-obvious rules:
- Reads still draw all three rng words. Even though
SELECT_BY_Konly needsk, it still drawsr3. Skipping the draw would shift the rng stream for every subsequent op and break determinism across scenarios. tag = "t" + decimal(n). Decimal string formatting, not hex — trivially easy to get wrong in C++ wherestd::ostringstream << std::hexis the default reflex.
Frozen golden hashes
Captured from the Rust release build. The cross-language test asserts these byte for byte.
| Scenario | CLI args | SHA-256 |
|---|---|---|
| A | --seed 42 --ops 500 --keys 32 --scenario default | e8ccacd39d8535c1ed101f0bc8b7a0799f56468a384da9284d4768cd8b3a3aab |
| B | --seed 7 --ops 2000 --keys 128 --scenario default | dd1d6bb7fec1ffc9f71f01e75a58166b04517a669495af2aa2da432d4722db69 |
Sources of cross-language divergence
A non-exhaustive checklist that we hit while building the three ports:
- Map iteration order. Go
mapiteration is randomised. Always collect keys thensort.Strings/sort.Slicebefore any side-effecting iteration that contributes to the wire stream. - Signed vs unsigned k.
r2 % keysis unsigned modular arithmetic in all three languages; we then cast toi64. A cast throughinton 32-bit platforms would lose bits. C++ usesstatic_cast<int64_t>, Rustas i64, Goint64(...). - Tag formatting. Use base-10 only. Padding, hex, or uppercase would all change the bytes silently.
- Splitmix64 constants. All three implementations use the same
triple:
0x9E3779B97F4A7C15,0xBF58476D1CE4E7B5,0x94D049BB133111EB. Forgetting theULLsuffix in C++ truncates the constants to 32 bits and produces a different stream. - SHA-256. Rust uses
sha2, Go usescrypto/sha256, C++ ships an inline reference implementation in src/cpp/src/sql15.cc. A canonical test vector (SHA256("abc")) is asserted in every test suite to catch a broken implementation before it pollutes a scenario hash. - No trailing newline from the CLI. The shell-level test compares
"$RUST_BIN ..."with"$GO_BIN ..."as raw strings; an extra\nfrom one of them silently fails the equality. Rust usesprint!, Go usesfmt.Print, C++ usesstd::cout << ...with no<< endl.
What this lab does not model
Listed up front so the reader does not look for them:
- No on-disk persistence, no WAL, no pager. The "snapshot" is an in-memory byte stream produced on demand.
- No concurrent transactions. MVCC visibility rules are implemented, but there is only one writer.
- No query planner;
SELECT_BY_KandSELECT_BY_TAGare direct map lookups. - No DDL. The schema is hard-coded.
Those are deliberately deferred to db-21 and the capstone (db-23).