db-07 Execution

Library API (per language, same shape)

fn compact(inputs: &[SstReader], drop_tombstones: bool) -> Vec<u8>
  • inputs[0] is newest.
  • Returns the bytes of a db-06 SSTable.
  • Empty inputs → 36-byte empty SSTable.

CLI

compact [--drop-tombstones] OUT.sst IN1.sst IN2.sst ...
  • IN1 is newest.
  • Output OUT.sst is byte-identical across rust/go/cpp for the same arguments.

Algorithm (pseudocode)

function compact(inputs, drop_tombstones):
  cursors = [iter(input) for input in inputs]   # each iter yields (key, entry) in key order
  heap = empty min-heap                          # entries: (key, src)
  for i, c in enumerate(cursors):
    k = c.peek()
    if k is not None: heap.push((k, i))

  out = SstWriter()
  while heap not empty:
    (k, i_win) = heap.pop()
    entry = cursors[i_win].next()                # consume winner
    if cursors[i_win].peek() is not None:
      heap.push((cursors[i_win].peek(), i_win))

    # Drain all older duplicates of the same key
    while heap not empty and heap.peek().key == k:
      (_, j) = heap.pop()
      cursors[j].next()
      if cursors[j].peek() is not None:
        heap.push((cursors[j].peek(), j))

    if entry.is_tombstone and drop_tombstones:
      continue
    out.add(k, entry)

  return out.finish()

Heap ordering: lexicographic on key; tiebreak by source index ascending (smaller index = newer = wins on equal keys).

How to wire it (per language)

LangCursorHeap
Ruststd::vec::IntoIter<(Vec<u8>, Entry)>BinaryHeap<Reverse<(Vec<u8>, usize)>>
Goindex into []struct{Key,Entry}container/heap with Less honoring (key,src)
C++pair of vector<...>::const_iteratorpriority_queue with greater-than comparator

All three "peek a cursor's current key" is cursors[i].keys[idx_i] (or equivalent) — there is no I/O during peek; entries are materialized once.