References — db-05 LSM MemTable
Primary sources
-
O'Neil, Cheng, Gawlick, O'Neil (1996). The Log-Structured Merge-Tree (LSM-Tree). Acta Informatica 33(4). The original paper. https://www.cs.umb.edu/~poneil/lsmtree.pdf
-
Sanjay Ghemawat & Jeff Dean. LevelDB. The reference open-source LSM. See
db/memtable.{h,cc}anddb/skiplist.h. https://github.com/google/leveldb -
Facebook RocksDB Wiki — MemTable. https://github.com/facebook/rocksdb/wiki/MemTable Covers SkipList, HashSkipList, HashLinkList, and Vector MemTable factories.
Skip lists
-
William Pugh (1990). Skip Lists: A Probabilistic Alternative to Balanced Trees. CACM 33(6): 668–676. https://homepage.cs.uiowa.edu/~ghosh/skip.pdf
-
LevelDB's skiplist implementation — concurrent single-writer/many-reader, lock-free reads via memory-order acquire/release. https://github.com/google/leveldb/blob/main/db/skiplist.h
Alternative data structures
- Bw-tree (Microsoft, 2013): lock-free B+ tree variant used in Hekaton.
- Adaptive Radix Tree (ART, 2013): compact, cache-friendly trie used by HyPer and DuckDB. https://db.in.tum.de/~leis/papers/ART.pdf
- Masstree (Mao, Kohler, Morris, 2012): trie-of-B+trees, very fast for variable length keys.
Tombstones and snapshot reads
-
RocksDB DeleteRange. Tombstones over key ranges, important for prefix deletes. https://github.com/facebook/rocksdb/wiki/DeleteRange
-
LevelDB sequence numbers. Each MemTable entry is internally tagged with a 56-bit sequence and 8-bit type byte; this lab simplifies to just the type byte. See
db/dbformat.hkValueTypeForSeek.
Real-world tunings
- Cassandra: uses
Memtablewith off-heap allocators; flushed to SSTables on size, time, or commit-log pressure. - HBase:
MemStoreper column family; configurable viahbase.hregion.memstore.flush.size. - InfluxDB IOx & TimescaleDB: apply LSM ideas to time-series, with time-bucketed MemTables.
Further reading
- Mark Callaghan's blog (RocksDB engineering): https://smalldatum.blogspot.com/
- Chen Luo & Michael Carey (2020). LSM-based Storage Techniques: A Survey. VLDB J. https://arxiv.org/abs/1812.07527