Broader Ideas — db-22
The lab as it stands is a deliberately minimal harness. These are extensions that would build naturally on top of it.
A. Percentile-aware bench harness
Replace the single-pass timer with a per-operation timing loop that
collects a histogram (HDR-style) of per-op latencies. Then bench
reports p50 / p90 / p99 / p99.9 in addition to throughput. This is
where the Gil Tene "How NOT to Measure Latency" talk earns its keep —
even on a synchronous single-thread loop, a long-tail GC pause in Go or
a page fault in C++ will move the tail dramatically.
Trap to avoid: the cost of taking a timestamp per op (time.Now() /
std::chrono::steady_clock::now()) is itself ~30 ns on most boxes,
which is comparable to one workload op. You'd need to time batches
of ops and divide.
B. Allocator pressure scenario
Add a third scenario whose workload is deliberately allocator-heavy:
short-lived strings as values (move from u64 to String), or a
churn pattern that constantly creates and removes keys so the map is
forced to resize. The cross-language throughput delta for this
scenario would be much larger than for the existing one, and the
results would speak to the maturity of each language's allocator.
C. Multi-threaded variant
Wrap CounterStore in a sync primitive and run N workers. The point
is not to demonstrate scaling — Mutex<BTreeMap<…>> won't scale —
but to demonstrate the difference between coarse locking, sharded
locking, and lockfree updates. Each language has different idioms here
(parking_lot vs std::sync, sync.Map vs atomic, std::shared_mutex
vs std::atomic), and the cross-language comparison becomes a
language-features comparison.
D. Snapshot replay / log shipping
Right now dump_snapshot produces bytes that are only used for hashing.
Add a restore_snapshot and a small "log" of operations (just the
sequence of (op, k, by) triples), and you have a tiny replicated
store. Connect three nodes via a deterministic schedule and you have a
toy version of db-23.
E. Energy and not-time metrics
On Apple Silicon, powermetrics --samplers cpu_power can give you
energy per op. The relative energy of the Rust / Go / C++ implementations
on the same workload is a more honest "which is more efficient" claim
than throughput, because it folds in stalls, branch mispredictions, and
memory bandwidth.
F. Comparison with off-the-shelf benchmark frameworks
Run the same workload under criterion (Rust), go test -bench, and
Google Benchmark (C++). Compare:
- Their reported throughput vs ours.
- Their reported variance.
- The shape of their output.
The lab's homegrown harness will look crude in comparison, and that's the point — the exercise of measuring the difference is more educational than the difference itself.
G. Worst-case scenario discovery
Use coverage-guided fuzzing on the workload generator (with the saturating-decrement invariant as the asserted property) to find a seed/ops/keys combination that maximizes either throughput or memory pressure. This connects perf work to the fuzz/property-test discipline used in db-13 and db-15.
H. Cross-architecture verification
Run the existing scripts/cross_test.sh under qemu-user-static for
aarch64 / x86_64 / riscv64 and confirm the hashes still match. They
should — the wire format is little-endian and the arithmetic is
all 64-bit — but the only way to be sure is to actually do it.
I. Cache-aware redesign of CounterStore
std::map / BTreeMap / sorted-Go-slice all use pointer-rich tree
structures. A flat sorted array with binary search would be slower for
insert but dramatically faster for the iteration step (which is the
critical path in dump_snapshot). For a workload that touches each
key only a handful of times before snapshotting, the array would be
worth measuring.
J. The "ten percent rule"
A small operational rule we picked up doing this lab: any perf change worth claiming must move the bench number by more than ten percent. Below that, run-to-run noise on a laptop dominates. Above that, you can usually attribute the change to a specific code path. The harness is deliberately not precise enough to defend a 2% claim, and that's a feature.