db-08 — Execution
Build order
- Rust first: drives the canonical data-shape decisions (the
Entryenum from db-06, the byte format ofSerializeStream). - Go second: ports the same algorithm with native data structures
(
container/list,container/heap). - C++ third: same algorithm with
std::list+std::priority_queue.
After all three pass their own unit tests, we run scripts/cross_test.sh
which builds canonical input SSTables via db-05 + db-06 and asserts that all
three merge_iter binaries produce the same sha256.
Per-language layout
Rust (src/rust)
Cargo.tomlpulls in db-06's sstable crate bypath = "../../../db-06-sstable-format/src/rust".src/lib.rsdefinesBlockCache,MergingIterator,SerializeStream, and re-exportssstable::Entry. The cache uses aHashMapof slot indices plus aVec<Node>arena with embedded prev/next indices and a free-list — an arena-based intrusive list, which beatsLinkedList<T>on allocator pressure.src/bin/merge_iter.rsis the cross-test CLI.
Go (src/go)
go.modis modulegithub.com/10xdev/dse/db08withreplacedirectives pointing to db-05 and db-06 on disk.lru.gousescontainer/list.Listandmap[BlockKey]*list.Element.iter.gousescontainer/heapwith a[]heapItembacking slice.cmd/merge_iter/main.gois the CLI.
C++ (src/cpp)
CMakeLists.txtdirectly compiles db-06'ssstable.ccinto oursstable_librather thanadd_subdirectorying db-06 — that would leak db-06'sadd_testregistrations into ourctest.lru.{h,cc}usesstd::list<Node>+std::unordered_map;Getuseslist_.splice(begin, list_, it->second)for O(1) MRU promotion.iter.{h,cc}usesstd::priority_queue<HeapEntry, std::vector, Greater>.src/merge_iter_bin.ccis the CLI.
Verification
scripts/verify.shbuilds + tests all three languages.scripts/cross_test.shbuilds db-05 + db-06 input pipelines, generates the samenewer.sst/older.sstused by db-07, runsmerge_iterin all three languages, and asserts sha256 byte-identity for bothdrop_tombstones=falseanddrop_tombstones=true. It also spot-checks that "NEW-10", "OLD-50", "val99" appear in the stream and that the key5 tombstone framing (040000006b65793501) appears exactly when expected.
Reproducible cross-test sha256 (this lab's truth)
drop=false → f693c483ef39dfef8e6285e29f9051a57e60bf2c4ba7b45bbf552c7932687fd1 (1874 bytes)
drop=true → ec71c56c89f451d33e58697af2d7bce985069078e1c599cc42062dfbba6e250e (1865 bytes)
The 9-byte difference is exactly the framing of one tombstone entry:
u32_le(4) + "key5" + u8(1) = 4 + 4 + 1 = 9 bytes.
What you should be able to do after this lab
- Sketch an LRU on a whiteboard in under three minutes and explain why both the hash map and the list are necessary.
- Explain why a K-way merge uses a heap and not nested merges, and quote the
O(N log K)comparison bound. - Identify, in any storage codebase, where the "newest-wins on tie" rule is enforced and where the "drain duplicates" step happens.
- Argue when it is safe to drop a tombstone during iteration vs when it is not.