Observation — db-21
1. The Three Hashes
A seed=42 ops=200 keys=32 tieredcompact fc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6
B seed=7 ops=500 keys=64 universalcompact 05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb
C seed=99 ops=300 keys=16 withrange 4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4
All three languages produce all three hashes on the first run after each clean build. This was not a happy accident — it required keeping every sneaky source of nondeterminism out of the merge step:
- HashSet iteration order doesn't leak (we sort
out_entriesby key after the merge, and we never serialise theseenset). - Map ordering doesn't leak (Go uses a
map[string]struct{}for dedupe but never iterates it; entries come out of a slice). - Floating-point comparison doesn't leak (the ratio is
0.5exactly, which is a representablef64;Σ size ≤ ratio · sizeis integer-vs-rational with no rounding ambiguity at this scale).
2. What Bit Us During Development
-
Two-pass size-tiered. An early draft computed
prefix_sumonce to pickchosen, then recomputed it inside the merge call. The two passes drifted under refactoring. Fixed by collapsing to a single pass that updatesprefix_suminline. -
Go
math.Float64bits. Initial Go draft tried to avoid themathimport by writing a wrapper chain (float64bits→float64bitsFallback→math_Float64bits). The chain was broken (nomathimport to define the leaf). Lesson: don't fight the standard library for ceremony. -
C++
std::optional<std::string>forGet. Worth the friction versus a sentinel value: a Put of the empty string is distinguishable from absent, which is testable indedup_keeps_last.
3. What We Didn't Observe (and why that's good)
- No platform endianness surprises. macOS arm64 produced the same hashes
the canonical fixtures pin. The explicit
LEencoding in every put-int helper means we'd survive a big-endian port too. - No
f64rounding drift. The ratio is0.5and the sizes are small integers; nothing forces denormals or transcendental math. - No SHA-256 mismatch. The Rust port uses an inline impl in
lsmctl.rs; the Go port usescrypto/sha256; the C++ port uses the 64-line public-domain reference at the bottom ofadv.cc. Three independent SHA-256 implementations agreeing on three hashes is the cheapest possible end-to-end test.
4. Resource Profile
Each cargo build --release takes ~5s cold. go build ~1s. cmake --build
~3s. cross_test.sh from cold runs in ~10s including all three builds. No
external network, no Docker, no system packages beyond a working C++20
toolchain, Go ≥ 1.22, and Rust stable.