References — Bloom Filters and Hashing

Foundational papers

  • Burton H. Bloom, Space/Time Trade-offs in Hash Coding with Allowable Errors, CACM 1970. The original 2-page paper. https://dl.acm.org/doi/10.1145/362686.362692
  • Adam Kirsch & Michael Mitzenmacher, Less Hashing, Same Performance: Building a Better Bloom Filter, ESA 2006. https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf
  • Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher, Cuckoo Filter: Practically Better Than Bloom, CoNEXT 2014. https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf
  • Thomas M. Graf & Daniel Lemire, Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters, JEA 2020. https://arxiv.org/abs/1912.08258
  • Peter C. Dillinger & Stefan Walzer, Ribbon Filter: Practically Smaller Than Bloom and Xor, 2021. https://arxiv.org/abs/2103.02515

Production code to read

  • LevelDB filter policy: https://github.com/google/leveldb/blob/main/util/bloom.cc
  • RocksDB filter blocks: https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter
  • RocksDB ribbon filter implementation: https://github.com/facebook/rocksdb/blob/main/util/ribbon_impl.h

Survey & blog posts

  • Daniel Lemire, "All about Bloom filters" series: https://lemire.me/blog/tag/bloom-filter/
  • Jeff Dean's classic numbers-every-programmer-should-know — useful when sizing filters against disk-seek and RAM costs.
  • Hadron, "How RocksDB sizes filters": https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide

Hash functions

  • Fowler–Noll–Vo (FNV) reference: http://www.isthe.com/chongo/tech/comp/fnv/
  • SplitMix64 (Vigna & Steele's high-quality mixer): https://prng.di.unimi.it/splitmix64.c
  • Austin Appleby, MurmurHash3: https://github.com/aappleby/smhasher/wiki/MurmurHash3
  • xxHash by Yann Collet: https://github.com/Cyan4973/xxHash