db-07 Broader Ideas
What you'd add next, in order of payoff
-
Output splitting. Add
compact_to_files(inputs, drop, target_bytes) -> Vec<Vec<u8>>. Implementation: switch SstWriter when the in-flight writer exceedstarget_bytes. You must finalize at a key boundary (between two emitted entries), never inside a logical key, otherwise readers that depend on per-file key ranges will see overlaps. -
Streaming block iterator on SstReader. db-06's
entries()materializes everything; the compaction loop should pull one entry at a time per cursor. This is db-08 territory (block cache + iterators). -
Range tombstones. A "delete all keys in [lo, hi)" record. Compaction has to track a set of active range tombstones during the merge and apply them to subsequent entries. Pebble's range-deletions doc is the reference.
-
Snapshot-aware tombstone purging. "Drop tombstones at bottom" becomes "drop tombstones older than the oldest live snapshot". Compaction takes a sequence-number floor and drops anything below it that has been superseded.
-
Leveled policy. A scheduler that picks N input files to compact based on per-level byte budgets and overlap. This is where
Compaction::PickFileandIsBaseLevelForKeylive in LevelDB. -
Subcompactions. Splitting one logical compaction into K parallel ones by key range. Requires that the index of each input lets you cheaply find the byte range covering a key span — partitioned index helps.
-
Compaction throttling. When compaction can't keep up, foreground writes must stall. RocksDB exposes
level0_slowdown_writes_triggerandlevel0_stop_writes_trigger. Without this, write bursts cause unbounded read amplification. -
Universal/tiered compaction. A different scheduler; same merge mechanism. Worth implementing once leveled is in to feel the trade-off.
-
Per-key sequence numbers. Every key gets a monotonically-increasing seqnum; compaction picks the highest-seqnum entry for each key. This makes the merge correct under concurrent writes and snapshots. Required for MVCC (db-13).
-
Compaction filter callbacks. RocksDB lets the user inspect/transform every key during compaction (garbage collection of TTL'd values, schema migration). It's just a hook in the emit step.
What this lab deliberately leaves un-clean for later
- No async I/O. The merge is CPU-bound on materialized vectors.
- No CRCs on blocks. Bad bytes in an input produce corrupt output silently.
- No fsync / atomic rename. The CLI writes the output and the script renames.
- No metrics. Production engines export bytes-read, bytes-written, files-in, files-out, duration per compaction.
These are intentional. The point of this lab is the merge, not the operational surface.