db-07 Analysis

Surface area

The lab exposes one library function and one CLI:

compact(inputs: ordered list of SSTable bytes, drop_tombstones: bool) -> SSTable bytes

inputs[0] is the newest. Empty input list returns an empty SSTable (36 bytes, identical to SstWriter::new().finish() from db-06).

CLI:

compact [--drop-tombstones] OUT.sst IN1.sst IN2.sst ...

State machine of the merge

The merger holds K cursors, one per input. Each cursor is a sequence of (key, entry) pairs in sorted key order, produced by iterating the input SSTable's blocks in order.

A min-heap holds at most K entries, each (current_key, source_index). source_index is the position in inputs (smaller = newer).

State transitions:

init:    push each non-empty cursor's first (key, src) into heap
step:    pop top (key=k, src=i)
         take entry from cursor i, advance cursor i
         for every other cursor j whose current key == k: advance cursor j
         re-push any cursor that still has items (only those that advanced past k)
         emit (k, entry) unless (entry is Tombstone AND drop_tombstones)
done:    when heap is empty

The "advance every cursor whose current key == k" rule is what makes the merge deduplicating. It is the only subtle bit. Forget it and SstWriter rejects the output with Unsorted because the same key reappears.

Containers per language

  • Rust: BinaryHeap<Reverse<(Vec<u8>, usize)>> — pop smallest by key, ties broken by source index (smaller = newer = wins). Cursors are IntoIter over pre-materialized Vec<(Vec<u8>, Entry)> from SstReader::entries().
  • Go: container/heap with a struct slice. Same ordering. Cursors are index counters into []Entry.
  • C++: std::priority_queue with custom comparator that flips to min-heap. Cursors are std::vector<...>::const_iterator pairs.

Materializing all entries up front is wasteful for huge SSTables but is fine for this lab and keeps the three implementations symmetric. A streaming reader is the next step (db-08 block-cache and iterators).

What's intentionally not optimized

  • We materialize entries instead of streaming blocks. This avoids needing a block-by-block iterator API on db-06's SstReader, which would couple the two labs more tightly than the curriculum wants at this stage.
  • We use a single output SSTable. Output splitting is one if-statement in the emit step (flush + start new SstWriter when size exceeds N). Doing it here would force a "list of outputs" API that doesn't matter for byte-identity.
  • We do not parallelize. K-way merge is trivially serial; partitioning is a policy concern that belongs above this layer.

What could break the cross-language byte-identity

  1. Tiebreaker inconsistency between heap implementations. Pin it: for two equal keys, the cursor with the smaller source index wins. All three implementations must agree on this exactly.
  2. Comparing keys as language-native strings (UTF-8 ordering). All three must compare as byte slices (Vec<u8> / []byte / std::vector<uint8_t>).
  3. Forgetting to advance non-winning duplicates. Output will contain repeats; SstWriter from db-06 will reject with Unsorted. Good — fail loud.
  4. Different block-target sizes. We always use the db-06 default (4096) so the output is a single block for the canonical scenario.

Verification plan in one line

Build two distinct memtables (newer + older), promote each to an SSTable using db-06, run compact [newer, older] in all three languages, then assert sha256 equality and spot-check that newest-wins applied correctly.