db-07 Step 2 — Tombstone purging and the bottom-level rule

A tombstone in an SSTable says: "this key was deleted; do not return any older version of it". Tombstones cost space and slow down reads (you still have to walk past them). Eventually you want to drop them.

The rule

A tombstone for key k is safe to drop during a compaction if and only if there is no older version of k anywhere in the database that the tombstone could be hiding.

Equivalently: if this compaction is over the bottom-most level and the tombstone's input is part of it, you can drop the tombstone.

For non-bottom compactions, keep all tombstones. A deeper level still has data the tombstone is suppressing.

API

compact(inputs, drop_tombstones: bool) -> bytes

The flag is the caller's promise. We do not inspect it; we trust it. In a real engine, the scheduler sets drop_tombstones = (target_level == bottom).

Implementation: one if-statement

In the emit loop, after picking the winner (k, entry):

if entry.type == Tombstone and drop_tombstones:
    continue   # skip; do not write to output
out.add(k, entry)

That's the entire change versus the basic merge.

What's still wrong (and why it's OK for this lab)

The "drop tombstones at bottom" rule is a snapshot-unaware simplification. A correct engine keeps a tombstone alive until every read snapshot older than the tombstone's sequence number has been released. Implementing that requires per-entry sequence numbers, which we add in db-13.

For this lab, "bottom" means "the caller swears nothing older exists". That is enough to demonstrate the mechanism and to write a meaningful cross-test.

Test scenarios this enables

TestInputs (newest first)dropExpected output
Tombstone wins over older valueA=[(k,T)], B=[(k,V,"x")]false[(k,T)]
Tombstone dropped at bottomsametrue[] (empty SSTable, 36 bytes)
Tombstone for non-existent key keptA=[(k,T)]false[(k,T)]
Tombstone for non-existent droppedsametrue[]
Mixed values + tombstones, mid-levelA=[(a,V),(b,T)], B=[(a,V_old),(c,V)]false[(a,A.V),(b,T),(c,V)]
Same inputs at bottomsametrue[(a,A.V),(c,V)]

These are V4 in the verification table and the drop_tombstones=true arm of the cross-test.