db-07 Broader Ideas

What you'd add next, in order of payoff

  1. Output splitting. Add compact_to_files(inputs, drop, target_bytes) -> Vec<Vec<u8>>. Implementation: switch SstWriter when the in-flight writer exceeds target_bytes. You must finalize at a key boundary (between two emitted entries), never inside a logical key, otherwise readers that depend on per-file key ranges will see overlaps.

  2. Streaming block iterator on SstReader. db-06's entries() materializes everything; the compaction loop should pull one entry at a time per cursor. This is db-08 territory (block cache + iterators).

  3. Range tombstones. A "delete all keys in [lo, hi)" record. Compaction has to track a set of active range tombstones during the merge and apply them to subsequent entries. Pebble's range-deletions doc is the reference.

  4. Snapshot-aware tombstone purging. "Drop tombstones at bottom" becomes "drop tombstones older than the oldest live snapshot". Compaction takes a sequence-number floor and drops anything below it that has been superseded.

  5. Leveled policy. A scheduler that picks N input files to compact based on per-level byte budgets and overlap. This is where Compaction::PickFile and IsBaseLevelForKey live in LevelDB.

  6. Subcompactions. Splitting one logical compaction into K parallel ones by key range. Requires that the index of each input lets you cheaply find the byte range covering a key span — partitioned index helps.

  7. Compaction throttling. When compaction can't keep up, foreground writes must stall. RocksDB exposes level0_slowdown_writes_trigger and level0_stop_writes_trigger. Without this, write bursts cause unbounded read amplification.

  8. Universal/tiered compaction. A different scheduler; same merge mechanism. Worth implementing once leveled is in to feel the trade-off.

  9. Per-key sequence numbers. Every key gets a monotonically-increasing seqnum; compaction picks the highest-seqnum entry for each key. This makes the merge correct under concurrent writes and snapshots. Required for MVCC (db-13).

  10. Compaction filter callbacks. RocksDB lets the user inspect/transform every key during compaction (garbage collection of TTL'd values, schema migration). It's just a hook in the emit step.

What this lab deliberately leaves un-clean for later

  • No async I/O. The merge is CPU-bound on materialized vectors.
  • No CRCs on blocks. Bad bytes in an input produce corrupt output silently.
  • No fsync / atomic rename. The CLI writes the output and the script renames.
  • No metrics. Production engines export bytes-read, bytes-written, files-in, files-out, duration per compaction.

These are intentional. The point of this lab is the merge, not the operational surface.