db-08 — Observation
What we measured (functional)
- 11 Rust unit tests pass (
cargo test): 4 LRU + 6 merger + 1 serializer. - 11 Go unit tests pass (
go test ./...): 4 LRU + 6 merger + 1 serializer. - 2 C++ ctest binaries pass (
test_lrucovers 4 cases,test_itercovers 7). - Cross-language sha256 match in both modes (see execution.md).
Anatomy of the output stream
For the canonical input (newer = bulk 50 + put key10=NEW-10 + del key5;
older = bulk 100 + put key50=OLD-50) the entry count is 100 with
drop=false and 99 with drop=true. Total byte-count for the output
stream:
drop=false: 1874 bytesdrop=true: 1865 bytes (delta 9 = exactly one tombstone frame)
Each value entry costs 4 (key_len) + len(key) + 1 (type) + 4 (val_len) + len(val) bytes. For our scenario, most keys are keyN (3-5 bytes) with
values valN (4-5 bytes), making the per-entry frame ~17-19 bytes.
Hit/miss behavior under repeated workloads
The lru_basic_hit_miss test demonstrates the basic counters: one Get on
a present key bumps hits to 1; one Get on an absent key bumps misses
to 1. The lru_evicts_lru_on_capacity test confirms that the eviction
counter increments exactly once when a fourth insert into a 3-slot cache
forces the LRU node out.
Tournament dynamics
With K = 2 sources in the cross-test, the heap has at most 2 entries; with
K = 7 (memtable + L0 file + 5 L1 files in a realistic LSM), the heap has at
most 7 entries regardless of the millions of entries flowing through. Heap
operations are O(log K) per Next(), so even at K = 1000 the per-entry
cost is ~10 comparisons.
Determinism
The serialize_is_deterministic_and_sized test in all three languages
constructs the same (key, entry) stream twice and confirms identical
serialized bytes. This is what the cross-test relies on — if any language
becomes non-deterministic (e.g., picks the wrong duplicate on a tie, or
serializes value lengths in big-endian), the sha256 mismatch will surface
immediately.
What surprised me
- The C++
std::priority_queueis min-heap-by-default only if you pass an explicitstd::greater-style comparator. Forgetting this gives a max-heap that emits keys in reverse order. - Rust's
BinaryHeapis max-heap-by-default; we wrap inReverse((key, src))to flip it, which also automatically gives the correct tie-break onsrc ascendingbecauseReverse(a) < Reverse(b)iffa > band the derived tupleOrdcompares lexicographically. - Go's
container/heaprequires you to writeLessyourself, so the tie-break is explicit and self-documenting.
What did not surprise me
The hit/miss counts came out exactly as expected on first run for all three languages. The K-way merge produced a sorted stream on first run for Rust and Go.