db-07 references
Foundational
- O'Neil, P. et al. The Log-Structured Merge-Tree (LSM-Tree). Acta Informatica, 1996. The original. Read sections 3–4 for the merge/rolling-merge mechanism.
- Chang, F. et al. Bigtable: A Distributed Storage System for Structured Data. OSDI 2006. Section 5.3 ("compactions") frames minor vs. major compactions on top of SSTables.
Engineering, read these
- LevelDB: https://github.com/google/leveldb/blob/main/db/version_set.cc
See
Compaction::IsBaseLevelForKeyfor the tombstone-purge rule andPickCompactionfor the leveled-policy choice. - LevelDB design notes: https://github.com/google/leveldb/blob/main/doc/impl.md
- RocksDB compaction overview: https://github.com/facebook/rocksdb/wiki/Compaction
- Pebble (CockroachDB's Go LSM) compaction notes: https://github.com/cockroachdb/pebble/blob/master/docs/range_deletions.md Pebble's range-tombstone treatment is what you graduate to after this lab.
- Universal (tiered) vs leveled, with numbers: https://github.com/facebook/rocksdb/wiki/Universal-Compaction
Curriculum companions
- mini-LSM Chapter 2.6 "Compaction Strategies": https://skyzh.github.io/mini-lsm/week2-06-task-types.html
- Designing Data-Intensive Applications, Ch. 3 — strikes the right level of abstraction for explaining write amp / read amp trade-offs.
Algorithm
- K-way merge with a min-heap: any algorithms textbook. The pattern here is identical to "merge K sorted lists" with an extra rule for duplicate keys.