db-08 — Observation

What we measured (functional)

  • 11 Rust unit tests pass (cargo test): 4 LRU + 6 merger + 1 serializer.
  • 11 Go unit tests pass (go test ./...): 4 LRU + 6 merger + 1 serializer.
  • 2 C++ ctest binaries pass (test_lru covers 4 cases, test_iter covers 7).
  • Cross-language sha256 match in both modes (see execution.md).

Anatomy of the output stream

For the canonical input (newer = bulk 50 + put key10=NEW-10 + del key5; older = bulk 100 + put key50=OLD-50) the entry count is 100 with drop=false and 99 with drop=true. Total byte-count for the output stream:

  • drop=false: 1874 bytes
  • drop=true : 1865 bytes (delta 9 = exactly one tombstone frame)

Each value entry costs 4 (key_len) + len(key) + 1 (type) + 4 (val_len) + len(val) bytes. For our scenario, most keys are keyN (3-5 bytes) with values valN (4-5 bytes), making the per-entry frame ~17-19 bytes.

Hit/miss behavior under repeated workloads

The lru_basic_hit_miss test demonstrates the basic counters: one Get on a present key bumps hits to 1; one Get on an absent key bumps misses to 1. The lru_evicts_lru_on_capacity test confirms that the eviction counter increments exactly once when a fourth insert into a 3-slot cache forces the LRU node out.

Tournament dynamics

With K = 2 sources in the cross-test, the heap has at most 2 entries; with K = 7 (memtable + L0 file + 5 L1 files in a realistic LSM), the heap has at most 7 entries regardless of the millions of entries flowing through. Heap operations are O(log K) per Next(), so even at K = 1000 the per-entry cost is ~10 comparisons.

Determinism

The serialize_is_deterministic_and_sized test in all three languages constructs the same (key, entry) stream twice and confirms identical serialized bytes. This is what the cross-test relies on — if any language becomes non-deterministic (e.g., picks the wrong duplicate on a tie, or serializes value lengths in big-endian), the sha256 mismatch will surface immediately.

What surprised me

  • The C++ std::priority_queue is min-heap-by-default only if you pass an explicit std::greater-style comparator. Forgetting this gives a max-heap that emits keys in reverse order.
  • Rust's BinaryHeap is max-heap-by-default; we wrap in Reverse((key, src)) to flip it, which also automatically gives the correct tie-break on src ascending because Reverse(a) < Reverse(b) iff a > b and the derived tuple Ord compares lexicographically.
  • Go's container/heap requires you to write Less yourself, so the tie-break is explicit and self-documenting.

What did not surprise me

The hit/miss counts came out exactly as expected on first run for all three languages. The K-way merge produced a sorted stream on first run for Rust and Go.