broader ideas

What a "real" SQLite slice would add

If the goal is fidelity rather than pedagogy, the natural next steps, roughly in order of payoff:

A pager backed by pwrite. This is db-11. Once you have a pager the snapshot becomes the file, not an ad-hoc byte stream.
Page-level checksums. Even just XXH3 per page; turns silent corruption into noisy corruption.
A WAL. Append-only journal of operations, replay on open. db-03 does the WAL; the fusion is in db-23.
Schema and DDL. CREATE TABLE, multiple tables, column types. The single-table assumption hides almost all the catalog complexity.
A query planner. Even just a cost-based decision between SELECT_BY_K and SELECT_BY_TAG would be educational. With one table and two indexes the planner is trivial; with joins it explodes.

What this engine could become with concurrency

The MVCC bookkeeping is already there — created_at and deleted_at. What is missing for real read-mostly concurrency:

A reader-visible snapshot timestamp, so SELECT reads consistently as of "the latest committed txid I saw".
Write-set tracking and a commit barrier, so two writers cannot both bump txid without serialising.
Garbage collection of tombstoned rows once no live reader could observe them. The current code holds tombstones forever, which is fine for a benchmark and disastrous for a real system.

The interesting thing is that the snapshot wire format would still work — you would just be dumping a consistent point rather than the literal in-memory state.

What this engine could never be

Without on-disk persistence, this is not a database; it is a test fixture. Adding the pager moves it to "embedded KV with SQL", which is roughly what SQLite is.

It will never be a server. Network protocols, connection management, client-side query plans, authentication — none of that is in scope for any lab in this series, by design.

Useful tangent: cross-language byte identity as a discipline

This lab is a microcosm of a discipline that pays off elsewhere:

gRPC and protobuf rely on a canonical encoding for hash-based signing.
Git's object hashing depends on a canonical layout per object type.
Bitcoin transactions are SHA-256-d in a canonical byte form.

Whenever you find yourself asking "is this implementation correct", producing a canonical byte stream from each implementation and hashing it is one of the cheapest mechanical proofs you can buy.

Distributed Systems Engineer — Build Databases & Consensus From Scratch

broader ideas

What a "real" SQLite slice would add

What this engine could become with concurrency

What this engine could never be

Useful tangent: cross-language byte identity as a discipline