db-08 — Analysis

Problem statement

We need two read-path primitives that the rest of the LSM stack assumes exist:

  1. A bounded in-memory cache that lets us amortize the cost of decoding SSTable blocks across many lookups, with predictable memory usage and O(1) operations.
  2. A streaming K-way merging iterator that exposes N pre-sorted sources as a single sorted, deduplicated stream — newest-wins on tie — without buffering all entries in memory.

Both must be small, dependency-light, and byte-deterministic when serialized (so the cross-language cross-test can detect any divergence).

Constraints

  • Determinism. Given identical inputs, the merge stream's serialized bytes must be identical across Rust, Go, and C++. This is the cross-test's only gate.
  • Bounded memory. The cache must cap at a user-supplied entry count; the iterator must use O(K) working set regardless of the number of entries.
  • No backtracking. The iterator is streaming: it must work on inputs that arrive lazily.
  • Newest-wins is strict. Source index 0 always wins. There are no timestamps, generations, or sequence numbers — that complexity is deferred to db-13 (MVCC).

Decisions

  • Cache eviction policy: LRU. Simple, predictable, well-understood. Not the best policy for all workloads (LRU-K, ARC, and CLOCK-Pro all beat it on scan-heavy workloads), but the correct teaching baseline.
  • Cache capacity unit: entries. Production systems bound by bytes; we use entries to keep the data structure (rather than the accounting) the focus.
  • Heap element shape: (key, source_index). Small and cheap to compare. Pulling the full entry into the heap would inflate comparator cost and force copies.
  • No timestamps / sequence numbers. Newest-wins is by source index alone.
  • Tombstone drop is opt-in. Callers pass drop_tombstones=true only when they have proven (via compaction rules — see db-07) that no older source could resurrect the deleted key.

Trade-offs

ChoiceProsCons
LRU (vs LRU-K, ARC, CLOCK)O(1) ops, simple to reasonScan-pollutes — one big scan can flush hot entries
Doubly-linked list (vs VecDeque)O(1) arbitrary removalHeavier per-node memory (two pointers)
Heap of (key, src) (vs entry)Cheap compares, no copiesIndirection back to source vector on every pop
Entry-bounded cap (vs byte)Simple, no per-block sizingMemory usage depends on block-size distribution
Drain-on-tie eagerlyCaller never sees duplicatesSlight extra work even when caller would dedupe

Risks

  • Heap ordering bug on tie. If the (key, src) comparator forgets to break ties on src ascending, the merger silently emits the older value on key collisions. The "newest-wins" test catches this on a 2-entry input.
  • Cache eviction at boundary. Inserting into a full cache and then immediately calling Get on the just-evicted key must miss, not hit.
  • Iterator reentrancy. Calling Next after end-of-stream must keep returning end-of-stream, not panic.
  • Cross-language drift on serialization. Endianness or length-prefix width mismatches would invalidate the sha256. We pin to "u32 LE length + bytes + u8 type [+ u32 LE val_len + val]".

Out of scope

  • Compression (RocksDB caches decompressed blocks; some configs cache both).
  • Pin/unpin handle protocol for zero-copy reads.
  • Snapshot/sequence-number-aware iteration (deferred to db-13).
  • Range deletes / range tombstones (deferred to db-21).
  • Block-cache statistics beyond hit/miss/evict.