References — db-02 Data Structures for Storage

Skip lists

  • Pugh, W. (1990). "Skip Lists: A Probabilistic Alternative to Balanced Trees." CACM 33(6). The original paper; six pages, very readable.
    https://www.cs.umd.edu/~pugh/galileo/papers/CACM_Skiplist_1990.pdf
  • LevelDB MemTable source — skip list with a single allocator arena.
    https://github.com/google/leveldb/blob/main/db/skiplist.h
  • RocksDB InlineSkipList — production skip list with per-node tail allocation.
    https://github.com/facebook/rocksdb/blob/main/memtable/inlineskiplist.h
  • Redis t_zset.c — skip list with per-node span field for O(log n) rank queries.
    https://github.com/redis/redis/blob/unstable/src/t_zset.c

Hash tables

  • Celis, P., Larson, P.-Å., Munro, J. I. (1985). "Robin Hood Hashing." FOCS '85. Original paper.
  • Pedro Celis's thesis on Robin Hood hashing (1986). The probe-distance analysis is here.
    https://cs.uwaterloo.ca/research/tr/1986/CS-86-14.pdf
  • Emmanuel Goossaert's deep dive — accessible and runnable.
    https://codecapsule.com/2013/11/11/robin-hood-hashing/
  • Google Abseil flat_hash_map — SIMD probing on top of open addressing.
    https://abseil.io/about/design/swisstables
  • Rust hashbrown — port of Abseil's SwissTable; the implementation behind std::collections::HashMap.
    https://github.com/rust-lang/hashbrown

Cost models & cache

  • Drepper, U. (2007). "What Every Programmer Should Know About Memory."
    https://akkadia.org/drepper/cpumemory.pdf
  • Jeff Dean's "Numbers Every Programmer Should Know" — the canonical latency hierarchy.
    https://gist.github.com/jboner/2841832
  • Pavlo, A. "Database Storage I" (CMU 15-445). The disk-vs-RAM cost-model lecture.
    https://15445.courses.cs.cmu.edu/fall2023/slides/03-storage1.pdf

Tree structures (preview for db-10)

  • Bayer, R., McCreight, E. (1972). "Organization and Maintenance of Large Ordered Indices." Acta Informatica. The original B-Tree paper.
  • Comer, D. (1979). "The Ubiquitous B-Tree." ACM Computing Surveys. The canonical survey.

Background

  • Sedgewick & Wayne, Algorithms 4th ed.
  • Cormen, Leiserson, Rivest, Stein, Introduction to Algorithms (CLRS) 3rd ed. — Ch. 11 (hash tables), Ch. 17 (amortized analysis).