db-07 Step 2 — Tombstone purging and the bottom-level rule
A tombstone in an SSTable says: "this key was deleted; do not return any older version of it". Tombstones cost space and slow down reads (you still have to walk past them). Eventually you want to drop them.
The rule
A tombstone for key k is safe to drop during a compaction if and only if
there is no older version of k anywhere in the database that the tombstone
could be hiding.
Equivalently: if this compaction is over the bottom-most level and the tombstone's input is part of it, you can drop the tombstone.
For non-bottom compactions, keep all tombstones. A deeper level still has data the tombstone is suppressing.
API
compact(inputs, drop_tombstones: bool) -> bytes
The flag is the caller's promise. We do not inspect it; we trust it. In a
real engine, the scheduler sets drop_tombstones = (target_level == bottom).
Implementation: one if-statement
In the emit loop, after picking the winner (k, entry):
if entry.type == Tombstone and drop_tombstones:
continue # skip; do not write to output
out.add(k, entry)
That's the entire change versus the basic merge.
What's still wrong (and why it's OK for this lab)
The "drop tombstones at bottom" rule is a snapshot-unaware simplification. A correct engine keeps a tombstone alive until every read snapshot older than the tombstone's sequence number has been released. Implementing that requires per-entry sequence numbers, which we add in db-13.
For this lab, "bottom" means "the caller swears nothing older exists". That is enough to demonstrate the mechanism and to write a meaningful cross-test.
Test scenarios this enables
| Test | Inputs (newest first) | drop | Expected output |
|---|---|---|---|
| Tombstone wins over older value | A=[(k,T)], B=[(k,V,"x")] | false | [(k,T)] |
| Tombstone dropped at bottom | same | true | [] (empty SSTable, 36 bytes) |
| Tombstone for non-existent key kept | A=[(k,T)] | false | [(k,T)] |
| Tombstone for non-existent dropped | same | true | [] |
| Mixed values + tombstones, mid-level | A=[(a,V),(b,T)], B=[(a,V_old),(c,V)] | false | [(a,A.V),(b,T),(c,V)] |
| Same inputs at bottom | same | true | [(a,A.V),(c,V)] |
These are V4 in the verification table and the drop_tombstones=true arm of
the cross-test.