db-22 — Performance and Benchmarking
Why this lab exists
Benchmarks lie. They mostly lie because a benchmark answers a different question than the one you thought you were asking. db-22 is a small, self-contained system whose only purpose is to be measured: a keyed in-memory counter store driven by a deterministic synthetic workload. We freeze a wire format and a workload, hash the resulting state across three implementations (Rust, Go, C++), and use the resulting binary identity as the load-bearing definition of "the same program."
Once correctness is cross-language identical, we can compare throughput of the three implementations on the same hardware honestly — and we can talk about what does and does not constitute a fair comparison.
The data structure: CounterStore
A CounterStore is a mapping i64 -> u64 (counters) plus a single
total_ops: u64 running counter. Three operations:
incr(k, by):total_ops += 1,counters[k] += by(entry created with valuebyif missing).decr(k, by):total_ops += 1. Ifkis missing the call has no further effect (total_opswas already incremented). Otherwise:- if
current <= by, the entry is removed (saturating decrement); - else
counters[k] = current - by.
- if
get(k) -> Option<u64>: live lookup, returnsNoneif absent.
There are no tombstones. Removed counters leave no trace in the snapshot. This is intentional and matters: it makes the snapshot a pure function of the final live state, not of the history of operations.
The semantic that total_ops is bumped on every call (including no-op
decr on missing) is the simplest invariant and is the one against which
all golden hashes were computed. Changing it would change every hash.
Wire format: dump_snapshot
The snapshot is a function CounterStore -> bytes whose output must be
byte-identical across all three implementations.
offset size field
------ ---- ---------------------------------------------
0 8 magic "DSEBENCH" (ASCII)
8 8 total_ops (u64 little-endian)
16 4 distinct_keys (u32 little-endian)
20+ 16 one row per key, ascending by key:
- 8 bytes: key (i64 little-endian)
- 8 bytes: count (u64 little-endian)
Ordering is the only subtle bit. Rust uses BTreeMap, whose iteration is
naturally ascending. Go uses a plain map and explicitly sorts the keys
before emitting. C++ uses std::map, also ascending. All three converge
on the same byte sequence.
The workload: deterministic by construction
We need a workload that:
- Is identical across languages.
- Exercises a mix of insert / mutate / delete to produce a non-trivial end state.
- Can be scaled in both
opsandkeys.
We use SplitMix64 for randomness. It is small, fast, has trivially
portable arithmetic (just u64 adds, shifts, multiplies, and xors), and
needs no library. The constants and step function are well-known:
state += 0x9E3779B97F4A7C15
z = state
z = (z ^ (z >> 30)) * 0xBF58476D1CE4E7B5
z = (z ^ (z >> 27)) * 0x94D049BB133111EB
return z ^ (z >> 31)
Each workload iteration draws exactly three u64 words. Drawing the
same number every iteration is what keeps the RNG stream identical across
languages even when one branch is short and another is long.
r1, r2, r3 = rng.next(), rng.next(), rng.next()
kind = (r1 >> 62) & 0x3 # 0,1,2 → Incr ; 3 → Decr (3:1 ratio)
k = (r2 % keys) as i64
by = (r3 % 100) + 1 # 1..=100
Three-to-one incr:decr means the counter map grows in expectation, but
with keys small relative to ops the map fills up and the decrement
path actually deletes entries — both code paths get exercised in any
non-trivial run.
Two frozen scenarios
| Scenario | seed | ops | keys | SHA-256 of snapshot |
|---|---|---|---|---|
| A | 42 | 500 | 32 | 4b72eab6cbc773ac9584104c5923a5139b34ab466052bdb8ceacb087c06a9015 |
| B | 7 | 5000 | 256 | 5c35e7b1507834fda4960246640e6fb0b194b75b9593bec87159eafcbc3876a1 |
scripts/cross_test.sh builds all three binaries and asserts that the
hashes match each other and these golden values.
Common cross-language divergence sources (and how we avoid them)
- Map iteration order. We never iterate
HashMap. We sort keys explicitly (Go) or useBTreeMap/std::map(Rust, C++). - Integer promotion in shifts.
u64-only arithmetic. No mixed signed/unsigned shifts. %semantics for negative operands.r2isu64; modulus and cast toi64happen exactly once and in the same order.size_twidth. We only putu32/u64on the wire, neversize_tdirectly.- Trailing whitespace / newlines in CLI output.
hashprints the hex with no trailing newline.benchwrites its line to stderr so it can never pollute stdout that a script might be hashing.
Bench methodology
benchctl bench runs a short warm-up (ops/10 + 1) to pull pages and
populate caches, then a single timed pass over ops calls. It prints
ops, keys, elapsed_us, ops_per_sec, and distinct (the number
of live counters at the end) to stderr.
This is intentionally crude — the workload is a single thread doing
in-memory map operations. It is good enough for "is the Rust build
twice as fast as the Go build?" type questions; it is not a
microbenchmark replacement for criterion / go test -bench / Google
Benchmark. The references in references.md cover the deeper rabbit
hole.
What you actually learn from this lab
- Why a benchmark needs to fix a deterministic workload before it fixes a metric.
- Why "the same program in two languages" needs a binary equality test, not a "looks the same" code review.
- Why bench harnesses must warm up, isolate stdout/stderr, and avoid hidden allocations inside the timed region.
- Why
HashMapiteration order is a footgun for portable wire formats.