Observation — db-22

Cross-language hash check

All three implementations agree on the bytes:

=== scenario A ===
rust: 4b72eab6cbc773ac9584104c5923a5139b34ab466052bdb8ceacb087c06a9015
go  : 4b72eab6cbc773ac9584104c5923a5139b34ab466052bdb8ceacb087c06a9015
cpp : 4b72eab6cbc773ac9584104c5923a5139b34ab466052bdb8ceacb087c06a9015
match + golden ok
=== scenario B ===
rust: 5c35e7b1507834fda4960246640e6fb0b194b75b9593bec87159eafcbc3876a1
go  : 5c35e7b1507834fda4960246640e6fb0b194b75b9593bec87159eafcbc3876a1
cpp : 5c35e7b1507834fda4960246640e6fb0b194b75b9593bec87159eafcbc3876a1
match + golden ok

Throughput probe (single representative run)

ops=100000 keys=1024 elapsed_us=7242 ops_per_sec=13806910 distinct=1024

About 13.8 million ops/sec for the Rust release build on a single thread, single core, no contention, on an Apple Silicon laptop. distinct=1024 tells us the map is fully populated at the end of the run — the increment-heavy mix means decrements rarely empty a slot at this keys cardinality.

Read this as: each op costs roughly 70 nanoseconds, of which a chunk is three SplitMix64 draws, a couple of map lookups, and the per-iteration loop overhead. It is in the right ballpark for an in-memory BTreeMap<i64, u64> workload.

What we are not measuring (and why that matters)

  • No allocator pressure beyond the initial map growth. The map reaches steady state after ~keys distinct entries are touched, and the rest of the run is in-place mutation.
  • No I/O, no syscalls, no real memory pressure. The whole working set fits in L2.
  • No latency distribution. We report a single throughput number. For a single-threaded synchronous loop, p99 latency would just be a rephrasing of throughput plus a small jitter from the OS scheduler.
  • No cross-language throughput numbers in this doc. You can collect them yourself with benchctl bench — but be honest about what you've measured (one machine, one moment, one workload).

Why the bench number is stable but not authoritative

The bench subcommand runs a small warm-up pass (ops/10 + 1) before the timed pass. On the order of 100k ops the warm-up is about 10k operations, which is enough to pull all the map slots and K256 SHA constants into the right caches. Without the warm-up the first pass is ~30% slower; with the warm-up, second-pass timings repeat to within a few percent run-to-run.

This is still a crude harness. We are not collecting CPU counters, we are not pinning to a CPU, we are not disabling turbo, we are not controlling for thermal state. Use these numbers for ordering ("did this change make it faster or slower?") and not for absolute claims ("Rust does N nanoseconds per op on this machine").

Sanity checks that fire if you break things

  • scenario_a_frozen / scenario_b_frozen — any change to wire format, mixing rule, or RNG step breaks both of these immediately.
  • splitmix64_known — guards against accidental constant-swap in the SplitMix64 mixing function.
  • sha256_vectors — guards against accidental damage to the SHA implementation in any language.
  • snapshot_layout_two_keys — pins the exact byte layout of a trivial 2-key snapshot, so a wire-format change shows a tightly localized failure (not just "scenario A differs").
  • workload_determinism — same seed/ops/keys gives the same bytes on two consecutive runs.