Phases & Labs
This curriculum has 5 phases and 23 labs. Phases build on each other, but within Phase 4 (consensus) you can do Raft → Paxos → ZAB in any order after the foundations in db-16.
Legend: ✅ complete · 🟡 scaffolded · ⬜ planned
Phase 1 — Storage Primitives & Foundations
Before you can build a database, you need to understand the medium it lives on.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| db-01 | Storage Primitives | ✅ | Pages, byte order, mmap vs pread, alignment, HDD/SSD/NVMe latency |
| db-02 | Data Structures for Storage | 🟡 | Skip lists, hash tables, when in-memory vs on-disk structures differ |
| db-03 | Write-Ahead Log | 🟡 | WAL framing, CRC32, fsync semantics, group commit |
| db-04 | Bloom Filters & Hashing | 🟡 | FPR math, xxHash vs Murmur, cuckoo & xor filter alternatives |
Phase 2 — LevelDB / LSM-Tree
Build a production-shape LSM-tree key-value store, the way Google built LevelDB and Meta forked it into RocksDB.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| db-05 | LSM MemTable | 🟡 | Skip-list MemTable, immutable MemTable, flush trigger |
| db-06 | SSTable Format | 🟡 | Data/index/filter blocks, restart points, footer |
| db-07 | LSM Compaction | 🟡 | Level vs size-tiered vs universal, write amplification |
| db-08 | Block Cache & Iterators | 🟡 | LRU, MergingIterator, snapshot via sequence numbers |
| db-09 | LevelDB Complete | 🟡 | Open/close, WriteBatch, recovery, YCSB benchmark |
Phase 3 — SQLite / B-Tree
Build a B+-tree storage engine, a pager, a SQL parser, a bytecode VM, and a transaction manager.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| db-10 | B-Tree Fundamentals | 🟡 | B-Tree vs B+-Tree, page layout, splits & merges |
| db-11 | Pager System | 🟡 | Page cache, rollback journal vs WAL mode, checkpointing |
| db-12 | SQL Frontend | 🟡 | Tokenizer, parser, AST, VDBE bytecode VM |
| db-13 | Transactions & MVCC | 🟡 | ACID, isolation levels, SQLite locks, MVCC vs 2PL |
| db-14 | Indexes & Query Planning | 🟡 | Secondary indexes, cost-based planner, ART, BRIN |
| db-15 | SQLite Complete | 🟡 | JOINs, aggregation, TPC-H subset benchmark |
Phase 4 — Consensus Algorithms
The three canonical consensus families — implemented, tested, and compared.
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| db-16 | Distributed Fundamentals | 🟡 | CAP, FLP, linearizability, vector clocks, HLC |
| db-17 | Raft | 🟡 | Election, AppendEntries, snapshotting, ReadIndex |
| db-18 | Paxos | 🟡 | Single-decree, Multi-Paxos, Flexible Paxos |
| db-19 | ZAB | 🟡 | Epochs, zxids, primary-backup vs leader-based |
| db-20 | Distributed KV Store | 🟡 | Raft + LevelDB backend, linearizable reads, sharding |
Phase 5 — Advanced Storage & Capstone
| Lab | Title | Status | Key Concepts |
|---|---|---|---|
| db-21 | Advanced Storage | 🟡 | io_uring, O_DIRECT, columnar layout, WiscKey |
| db-22 | Performance & Benchmarking | 🟡 | YCSB A–F, flamegraphs, NUMA, perf counters |
| db-23 | Capstone Distributed DB | 🟡 | SQL → planner → LevelDB → Raft; 2PC over Raft groups |
Suggested Pace
- Full-time learner: ~2 labs per week ⇒ ~12 weeks end-to-end.
- Side-project learner: ~1 lab every 1–2 weeks ⇒ ~6 months.
- Reading-only path: skim
CONCEPTS.md+docs/analysis.mdper lab ⇒ ~1 week for the entire curriculum.
Recommended Progression
Phase 1 (must do all 4 in order)
│
├─→ Phase 2 (LevelDB) ──┐
│ │
└─→ Phase 3 (SQLite) ────┤
↓
Phase 4 (Consensus)
↓
Phase 5 (Capstone)
Phase 2 and Phase 3 are independent — pick the storage style that excites you first. Phase 4 only references Phase 1 fundamentals, so you can detour into consensus early if you want. Phase 5's capstone assumes all four prior phases.