analysis
The shape of the problem
We want the smallest engine that still demonstrably integrates the five things the SQLite-track labs build separately: a primary keyed container, a secondary index, an MVCC visibility scheme, a SQL surface, and a reproducible on-the-wire snapshot. "Smallest" here means: any feature we cut must be a feature that other labs already cover or labs after this will cover (db-21, db-23).
Three forces pull on the design:
- It has to be correct in three languages at once. Cross-language byte identity is the cheap, mechanical proof that the implementations agree. Anything that varies between language runtimes (hash map ordering, string formatting, integer width, signedness on casts) becomes a hazard.
- It has to be small enough to keep in your head. The whole engine is ~400 lines per language. That budget forced us to drop the pager, the on-disk format, and any kind of query planner.
- It has to actually test the integration. A no-op
UPDATEthat silently bumps the txid would not be caught by the unit tests in any one language — only the cross-language hash comparison would expose it.
Why MVCC over locking
A locking implementation would have been smaller, but it would not
have produced a visible artefact for the snapshot. With MVCC we have
the row-level created_at / deleted_at pair as observable state, and
the snapshot dump can carry it. That gives us something to compare.
Why a secondary index
Without one, the snapshot would be just a sorted map dump and the cross-language test would degenerate into "do all three languages sort ints the same way" (trivially yes). The secondary forces us to also sort strings deterministically, which is where Go's randomised map iteration would otherwise bite.
Where the test surface actually catches bugs
A pleasant surprise: most of the time the unit tests in any one language pass and only the cross-language script fails. That is diagnostic in itself — it almost always points at either:
- a missing
sort.Strings/sort.Slicein Go, - a
static_cast<int>instead ofstatic_cast<int64_t>in C++, - an unsuffixed
0x9E3779B97F4A7C15constant in C++ that the compiler promotes toint(and then warns about, but the warning is buried in a thousand-line build log).
The two frozen scenarios are deliberately sized:
- Scenario A (
--ops 500 --keys 32): small enough to debug by re-running with a smaller op count and diffing the intermediate snapshots. - Scenario B (
--ops 2000 --keys 128): large enough to thrash the secondary index and the tombstone code path.