References — db-22
Primary sources on benchmarking
-
Brendan Gregg. Systems Performance: Enterprise and the Cloud, 2nd ed., Addison-Wesley, 2020. The canonical modern reference. Chapter 12 ("Benchmarking") is required reading; the "active benchmarking" methodology and the catalog of common mistakes (cold-cache effects, the wrong saturation point, the wrong unit) frame the entire lab.
-
Brendan Gregg. BPF Performance Tools, Addison-Wesley, 2019. Less directly relevant here but the right book if you want to observe what your benchmark is actually doing on a Linux box.
-
Gil Tene. "How NOT to Measure Latency." Strange Loop 2015. The "coordinated omission" talk. Even on an in-memory benchmark like this one, the principle generalizes: the metric you report has to match the question the user is asking. We intentionally report
ops_per_sec, not p99 latency, because a single-threaded synchronous loop does not have an interesting tail. -
Bryant & O'Hallaron. Computer Systems: A Programmer's Perspective, 3rd ed., Pearson, 2015. Chapter 5 ("Optimizing Program Performance") and Chapter 9 ("Virtual Memory") supply the "always measure one level deeper" instinct used throughout the docs.
Determinism, RNGs, and reproducible benchmarks
-
Sebastiano Vigna. "An experimental exploration of Marsaglia's xorshift generators, scrambled." ACM TOMS, 2014. SplitMix64 and friends. Justification for using SplitMix64 here: it has trivially portable arithmetic and a well-defined byte-identical output across languages.
-
Guy Steele, Doug Lea, Christine Flood. "Fast Splittable Pseudorandom Number Generators." OOPSLA 2014. The paper that introduced SplitMix.
Microbenchmarking pitfalls (per-language)
-
Andrey Akinshin. Pro .NET Benchmarking, Apress, 2019. Despite the .NET framing, chapters 1–4 are language-agnostic gold: warm-up, steady state, the dead-code-elimination trap, JIT vs AOT timing.
-
Aleksey Shipilëv. "JMH samples" and his "Nanotrusting the Nanotime" blog post. Java-specific but the lessons are universal — particularly the discussion of
System.nanoTimeresolution traps, which apply equally tostd::chrono::steady_clockand Go'stime.Now(). -
Rust:
criteriondocumentation, especially the section on outlier detection. -
Go: the
testingpackage'sBenchmarkdocs and Dave Cheney's "Five things that make Go fast". -
C++: Google Benchmark and Chandler Carruth's CppCon talk "Tuning C++".
Cross-language byte-equality engineering
-
The Cap'n Proto encoding spec. A worked example of a wire format designed for cross-language stability. We do not use Cap'n Proto here, but its constraints (fixed-width little-endian, no sentinel ordering ambiguity, no implicit string normalization) are the same constraints we impose on
dump_snapshot. -
Go issue #7986 —
mapiteration is intentionally randomized. Read the issue and the surrounding discussion; this is the canonical worked example of why a portable wire format may never iterate a hash map without an explicit sort.
Background reading on what "fast" means
-
Latency Numbers Every Programmer Should Know (the Peter Norvig / Jeff Dean table). Internalize the ratios. The point of the bench harness is to put your numbers somewhere on this chart.
-
Ulrich Drepper. "What Every Programmer Should Know About Memory." Long and old but still the right tour of the memory hierarchy your bench is actually hitting.