db-07 Execution
Library API (per language, same shape)
fn compact(inputs: &[SstReader], drop_tombstones: bool) -> Vec<u8>
inputs[0]is newest.- Returns the bytes of a db-06 SSTable.
- Empty
inputs→ 36-byte empty SSTable.
CLI
compact [--drop-tombstones] OUT.sst IN1.sst IN2.sst ...
- IN1 is newest.
- Output OUT.sst is byte-identical across rust/go/cpp for the same arguments.
Algorithm (pseudocode)
function compact(inputs, drop_tombstones):
cursors = [iter(input) for input in inputs] # each iter yields (key, entry) in key order
heap = empty min-heap # entries: (key, src)
for i, c in enumerate(cursors):
k = c.peek()
if k is not None: heap.push((k, i))
out = SstWriter()
while heap not empty:
(k, i_win) = heap.pop()
entry = cursors[i_win].next() # consume winner
if cursors[i_win].peek() is not None:
heap.push((cursors[i_win].peek(), i_win))
# Drain all older duplicates of the same key
while heap not empty and heap.peek().key == k:
(_, j) = heap.pop()
cursors[j].next()
if cursors[j].peek() is not None:
heap.push((cursors[j].peek(), j))
if entry.is_tombstone and drop_tombstones:
continue
out.add(k, entry)
return out.finish()
Heap ordering: lexicographic on key; tiebreak by source index ascending (smaller index = newer = wins on equal keys).
How to wire it (per language)
| Lang | Cursor | Heap |
|---|---|---|
| Rust | std::vec::IntoIter<(Vec<u8>, Entry)> | BinaryHeap<Reverse<(Vec<u8>, usize)>> |
| Go | index into []struct{Key,Entry} | container/heap with Less honoring (key,src) |
| C++ | pair of vector<...>::const_iterator | priority_queue with greater-than comparator |
All three "peek a cursor's current key" is cursors[i].keys[idx_i] (or
equivalent) — there is no I/O during peek; entries are materialized once.