db-10 — Broader Ideas

Where this in-memory B-tree fits in the rest of the track, and which real-world techniques live one or two steps beyond it.

Immediate next labs

db-11 — Pager system. Wraps each node in a fixed-size disk page. Trades the heap-allocated Box<Node> recursion for a PageId-indexed page cache plus a free-list. Introduces the B+-tree layout (values only in leaves; leaves doubly linked for range scans) because internal nodes must fit one to a page.
db-12 — SQL frontend. Parses a small SQL subset (CREATE TABLE, INSERT, SELECT, UPDATE, DELETE), plans it into a B+-tree-backed table, and exposes a REPL.
db-13 — Transactions and MVCC. Versioned B+-tree pages so readers do not block writers. Snapshots are root-page references at a given commit timestamp.
db-14 — Indexes and query optimization. Secondary B-trees whose keys are (indexed_column, primary_key) pairs. Plans index scans, index-only scans, and merge joins.
db-15 — SQLite-complete. Everything above stitched into one executable; the B-tree track's counterpart to db-09.

How this lab's pieces show up in real systems

T = 2 "demo size" B-trees are exactly what every textbook uses as a teaching aid, including the one most engineers learn on. Production engines use T chosen to fit a 4 KiB / 8 KiB page, but the algorithms are unchanged.
Proactive split / rebalance is the standard discipline; the alternative (descend, then walk back up to fix overflows) is textbook for binary search trees but rare in B-trees because it makes concurrency control much harder.
Preorder canonical serialization is the same shape SQLite uses for its page_dump tooling and what RocksDB's sst_dump produces for its SSTables. Every storage engineer needs some byte-exact dump format; here we picked the simplest one that captures structure.
SplitMix64 is the standard hash-mixing primitive used by modern hash tables (Java 8 HashMap, Go's runtime-internal randn, and absl::flat_hash_map's perturbation). Using it for the workload generator means the keys we touch are realistically randomly distributed, not pathologically biased.

Performance experiments worth running later

Plot len() vs serialized size to see the per-key overhead at T = 2. Compare to T = 64 (db-11) to see how internal-node shrinkage from B+-tree leaves changes the breakdown.
Sweep KEY_SPACE from 100 up to 100 000 and watch the insert-delete-insert workload's steady-state size oscillate.
Replace the recursive serialize_tree with an explicit-stack iterative version and measure the wall-time gap. Useful prep before db-22.

What "production-quality" would require beyond this lab

Variable-length keys with prefix compression on the page.
Page-level checksums and a magic byte at offset 0 so a corrupted read fails loudly instead of returning random keys.
Free-list management for reclaimed pages after deletes (db-11).
Concurrent insert/delete protocols: latch coupling, optimistic lock coupling, or Bayer's "B-link tree" right-link technique for no-blocking traversal during split.
Copy-on-write pages so readers see a consistent snapshot during writes (LMDB-style).
A persistent "wal" record per page mutation so the tree can be replayed on recovery (db-03 / db-11).

None of these change the shape of the in-memory algorithms — they add policies on top of the same proactive-split / proactive- rebalance kernel built here.

Distributed Systems Engineer — Build Databases & Consensus From Scratch

db-10 — Broader Ideas

Immediate next labs

How this lab's pieces show up in real systems

Performance experiments worth running later

What "production-quality" would require beyond this lab