References — db-22

Primary sources on benchmarking

Brendan Gregg. Systems Performance: Enterprise and the Cloud, 2nd ed., Addison-Wesley, 2020. The canonical modern reference. Chapter 12 ("Benchmarking") is required reading; the "active benchmarking" methodology and the catalog of common mistakes (cold-cache effects, the wrong saturation point, the wrong unit) frame the entire lab.
Brendan Gregg. BPF Performance Tools, Addison-Wesley, 2019. Less directly relevant here but the right book if you want to observe what your benchmark is actually doing on a Linux box.
Gil Tene. "How NOT to Measure Latency." Strange Loop 2015. The "coordinated omission" talk. Even on an in-memory benchmark like this one, the principle generalizes: the metric you report has to match the question the user is asking. We intentionally report ops_per_sec, not p99 latency, because a single-threaded synchronous loop does not have an interesting tail.
Bryant & O'Hallaron. Computer Systems: A Programmer's Perspective, 3rd ed., Pearson, 2015. Chapter 5 ("Optimizing Program Performance") and Chapter 9 ("Virtual Memory") supply the "always measure one level deeper" instinct used throughout the docs.

Sebastiano Vigna. "An experimental exploration of Marsaglia's xorshift generators, scrambled." ACM TOMS, 2014. SplitMix64 and friends. Justification for using SplitMix64 here: it has trivially portable arithmetic and a well-defined byte-identical output across languages.
Guy Steele, Doug Lea, Christine Flood. "Fast Splittable Pseudorandom Number Generators." OOPSLA 2014. The paper that introduced SplitMix.

Andrey Akinshin. Pro .NET Benchmarking, Apress, 2019. Despite the .NET framing, chapters 1–4 are language-agnostic gold: warm-up, steady state, the dead-code-elimination trap, JIT vs AOT timing.
Aleksey Shipilëv. "JMH samples" and his "Nanotrusting the Nanotime" blog post. Java-specific but the lessons are universal — particularly the discussion of System.nanoTime resolution traps, which apply equally to std::chrono::steady_clock and Go's time.Now().
Rust: criterion documentation, especially the section on outlier detection.
Go: the testing package's Benchmark docs and Dave Cheney's "Five things that make Go fast".
C++: Google Benchmark and Chandler Carruth's CppCon talk "Tuning C++".

The Cap'n Proto encoding spec. A worked example of a wire format designed for cross-language stability. We do not use Cap'n Proto here, but its constraints (fixed-width little-endian, no sentinel ordering ambiguity, no implicit string normalization) are the same constraints we impose on dump_snapshot.
Go issue #7986 — map iteration is intentionally randomized. Read the issue and the surrounding discussion; this is the canonical worked example of why a portable wire format may never iterate a hash map without an explicit sort.

Latency Numbers Every Programmer Should Know (the Peter Norvig / Jeff Dean table). Internalize the ratios. The point of the bench harness is to put your numbers somewhere on this chart.
Ulrich Drepper. "What Every Programmer Should Know About Memory." Long and old but still the right tour of the memory hierarchy your bench is actually hitting.