Phases & Labs

This curriculum has 5 phases and 23 labs. Phases build on each other, but within Phase 4 (consensus) you can do Raft → Paxos → ZAB in any order after the foundations in db-16.

Legend: ✅ complete · 🟡 scaffolded · ⬜ planned


Phase 1 — Storage Primitives & Foundations

Before you can build a database, you need to understand the medium it lives on.

LabTitleStatusKey Concepts
db-01Storage PrimitivesPages, byte order, mmap vs pread, alignment, HDD/SSD/NVMe latency
db-02Data Structures for Storage🟡Skip lists, hash tables, when in-memory vs on-disk structures differ
db-03Write-Ahead Log🟡WAL framing, CRC32, fsync semantics, group commit
db-04Bloom Filters & Hashing🟡FPR math, xxHash vs Murmur, cuckoo & xor filter alternatives

Phase 2 — LevelDB / LSM-Tree

Build a production-shape LSM-tree key-value store, the way Google built LevelDB and Meta forked it into RocksDB.

LabTitleStatusKey Concepts
db-05LSM MemTable🟡Skip-list MemTable, immutable MemTable, flush trigger
db-06SSTable Format🟡Data/index/filter blocks, restart points, footer
db-07LSM Compaction🟡Level vs size-tiered vs universal, write amplification
db-08Block Cache & Iterators🟡LRU, MergingIterator, snapshot via sequence numbers
db-09LevelDB Complete🟡Open/close, WriteBatch, recovery, YCSB benchmark

Phase 3 — SQLite / B-Tree

Build a B+-tree storage engine, a pager, a SQL parser, a bytecode VM, and a transaction manager.

LabTitleStatusKey Concepts
db-10B-Tree Fundamentals🟡B-Tree vs B+-Tree, page layout, splits & merges
db-11Pager System🟡Page cache, rollback journal vs WAL mode, checkpointing
db-12SQL Frontend🟡Tokenizer, parser, AST, VDBE bytecode VM
db-13Transactions & MVCC🟡ACID, isolation levels, SQLite locks, MVCC vs 2PL
db-14Indexes & Query Planning🟡Secondary indexes, cost-based planner, ART, BRIN
db-15SQLite Complete🟡JOINs, aggregation, TPC-H subset benchmark

Phase 4 — Consensus Algorithms

The three canonical consensus families — implemented, tested, and compared.

LabTitleStatusKey Concepts
db-16Distributed Fundamentals🟡CAP, FLP, linearizability, vector clocks, HLC
db-17Raft🟡Election, AppendEntries, snapshotting, ReadIndex
db-18Paxos🟡Single-decree, Multi-Paxos, Flexible Paxos
db-19ZAB🟡Epochs, zxids, primary-backup vs leader-based
db-20Distributed KV Store🟡Raft + LevelDB backend, linearizable reads, sharding

Phase 5 — Advanced Storage & Capstone

LabTitleStatusKey Concepts
db-21Advanced Storage🟡io_uring, O_DIRECT, columnar layout, WiscKey
db-22Performance & Benchmarking🟡YCSB A–F, flamegraphs, NUMA, perf counters
db-23Capstone Distributed DB🟡SQL → planner → LevelDB → Raft; 2PC over Raft groups

Suggested Pace

  • Full-time learner: ~2 labs per week ⇒ ~12 weeks end-to-end.
  • Side-project learner: ~1 lab every 1–2 weeks ⇒ ~6 months.
  • Reading-only path: skim CONCEPTS.md + docs/analysis.md per lab ⇒ ~1 week for the entire curriculum.
Phase 1 (must do all 4 in order)
   │
   ├─→ Phase 2 (LevelDB)  ──┐
   │                        │
   └─→ Phase 3 (SQLite) ────┤
                            ↓
                         Phase 4 (Consensus)
                            ↓
                         Phase 5 (Capstone)

Phase 2 and Phase 3 are independent — pick the storage style that excites you first. Phase 4 only references Phase 1 fundamentals, so you can detour into consensus early if you want. Phase 5's capstone assumes all four prior phases.