Observation — db-21

1. The Three Hashes

A  seed=42  ops=200  keys=32  tieredcompact     fc2fe88978eb2d419a73a7a16fa9ec0695ad9a56cb3a31b0bf85c0a28d7c97d6
B  seed=7   ops=500  keys=64  universalcompact  05b07426e0da8ec2f1f8c81573dc275cd61cab9c19c93dc17c854456e441e7bb
C  seed=99  ops=300  keys=16  withrange         4ad255755dbfbaa40a842766656d0c0dbd6713b6a527ffea5a24fa35964d73e4

All three languages produce all three hashes on the first run after each clean build. This was not a happy accident — it required keeping every sneaky source of nondeterminism out of the merge step:

  • HashSet iteration order doesn't leak (we sort out_entries by key after the merge, and we never serialise the seen set).
  • Map ordering doesn't leak (Go uses a map[string]struct{} for dedupe but never iterates it; entries come out of a slice).
  • Floating-point comparison doesn't leak (the ratio is 0.5 exactly, which is a representable f64; Σ size ≤ ratio · size is integer-vs-rational with no rounding ambiguity at this scale).

2. What Bit Us During Development

  1. Two-pass size-tiered. An early draft computed prefix_sum once to pick chosen, then recomputed it inside the merge call. The two passes drifted under refactoring. Fixed by collapsing to a single pass that updates prefix_sum inline.

  2. Go math.Float64bits. Initial Go draft tried to avoid the math import by writing a wrapper chain (float64bitsfloat64bitsFallbackmath_Float64bits). The chain was broken (no math import to define the leaf). Lesson: don't fight the standard library for ceremony.

  3. C++ std::optional<std::string> for Get. Worth the friction versus a sentinel value: a Put of the empty string is distinguishable from absent, which is testable in dedup_keeps_last.

3. What We Didn't Observe (and why that's good)

  • No platform endianness surprises. macOS arm64 produced the same hashes the canonical fixtures pin. The explicit LE encoding in every put-int helper means we'd survive a big-endian port too.
  • No f64 rounding drift. The ratio is 0.5 and the sizes are small integers; nothing forces denormals or transcendental math.
  • No SHA-256 mismatch. The Rust port uses an inline impl in lsmctl.rs; the Go port uses crypto/sha256; the C++ port uses the 64-line public-domain reference at the bottom of adv.cc. Three independent SHA-256 implementations agreeing on three hashes is the cheapest possible end-to-end test.

4. Resource Profile

Each cargo build --release takes ~5s cold. go build ~1s. cmake --build ~3s. cross_test.sh from cold runs in ~10s including all three builds. No external network, no Docker, no system packages beyond a working C++20 toolchain, Go ≥ 1.22, and Rust stable.