Step 01 — Sorted map + Entry type
Build the in-memory MemTable: a sorted associative container from byte-key to an
Entry that is either Value(bytes) or Tombstone. Implement put, delete,
get, iter, and len / size_bytes in all three languages with the same
iteration order (byte-lex).
Why this first
The MemTable's iteration order is the contract that the on-disk format and the
SSTable flush both depend on. If three languages disagree on order, every later step
falls apart. So this step is a one-language-after-the-other implementation of the
same BTreeMap-equivalent, with a shared unit test that inserts a permutation and
checks the order.
Entry type
#![allow(unused)] fn main() { // Rust #[derive(Clone, Debug, PartialEq, Eq)] pub enum Entry { Value(Vec<u8>), Tombstone, } }
// Go
type EntryType uint8
const (
EntryValue EntryType = 0
EntryTombstone EntryType = 1
)
type Entry struct {
Type EntryType
Value []byte // empty if Tombstone
}
// C++
namespace dse::memtable {
enum class EntryType : std::uint8_t { Value = 0, Tombstone = 1 };
struct Entry {
EntryType type;
std::vector<std::uint8_t> value; // empty if Tombstone
};
}
The type-byte numbering (0 = Value, 1 = Tombstone) is part of the on-disk
contract — don't reorder it.
Container choice
#![allow(unused)] fn main() { // Rust — Vec<u8>'s Ord is byte-lex; BTreeMap iterates in key order use std::collections::BTreeMap; pub struct MemTable { map: BTreeMap<Vec<u8>, Entry>, bytes: usize, } }
// Go — unordered map; sort keys on iteration / encode
type MemTable struct {
m map[string]Entry
bytes int
}
func (t *MemTable) sortedKeys() []string {
keys := make([]string, 0, len(t.m))
for k := range t.m {
keys = append(keys, k)
}
sort.Strings(keys) // byte-lex on string is the same as on []byte
return keys
}
// C++ — std::map's comparator is operator< on vector<uint8_t>, which is lex
class MemTable {
std::map<std::vector<std::uint8_t>, Entry> map_;
std::size_t bytes_ = 0;
};
put / delete
#![allow(unused)] fn main() { pub fn put(&mut self, key: &[u8], value: &[u8]) { self.bytes -= self.entry_bytes(key); self.map.insert(key.to_vec(), Entry::Value(value.to_vec())); self.bytes += self.entry_bytes(key); } pub fn delete(&mut self, key: &[u8]) { self.bytes -= self.entry_bytes(key); self.map.insert(key.to_vec(), Entry::Tombstone); self.bytes += self.entry_bytes(key); } fn entry_bytes(&self, key: &[u8]) -> usize { match self.map.get(key) { None => 0, Some(Entry::Value(v)) => 9 + key.len() + v.len(), Some(Entry::Tombstone) => 9 + key.len(), } } }
Go and C++ use the same accounting trick: subtract the old entry's contribution, update, add the new contribution.
iter
#![allow(unused)] fn main() { pub fn iter(&self) -> impl Iterator<Item = (&[u8], &Entry)> { self.map.iter().map(|(k, e)| (k.as_slice(), e)) } }
func (t *MemTable) Iter() []KeyEntry {
out := make([]KeyEntry, 0, len(t.m))
for _, k := range t.sortedKeys() {
out = append(out, KeyEntry{Key: []byte(k), Entry: t.m[k]})
}
return out
}
const std::map<std::vector<std::uint8_t>, Entry>& Iter() const noexcept {
return map_;
}
Test — order determinism
Insert this permutation in each language and assert iteration yields the keys in the canonical sorted order:
inputs (insert order): ["b", "a", "", "\x00\x00", "ab", "\x00"]
expected iter order: ["", "\x00", "\x00\x00", "a", "ab", "b"]
This catches:
- Go forgetting to sort.
- C++ using
std::map<std::string, ...>(where'\0'ends the string and breaks comparisons on binary keys). - Anyone using a hash map.
What to verify before moving on
putthengetreturns the value just written.deletethengetreturnsTombstone(not absent).- Overwriting a key keeps
len()at 1. - The permutation test above passes.
size_bytes()increases by exactly9 + klen + vlenfor each new key and stays flat when overwriting.