Step 01 — Sorted map + Entry type

Build the in-memory MemTable: a sorted associative container from byte-key to an Entry that is either Value(bytes) or Tombstone. Implement put, delete, get, iter, and len / size_bytes in all three languages with the same iteration order (byte-lex).

Why this first

The MemTable's iteration order is the contract that the on-disk format and the SSTable flush both depend on. If three languages disagree on order, every later step falls apart. So this step is a one-language-after-the-other implementation of the same BTreeMap-equivalent, with a shared unit test that inserts a permutation and checks the order.

Entry type

#![allow(unused)]
fn main() {
// Rust
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum Entry {
    Value(Vec<u8>),
    Tombstone,
}
}
// Go
type EntryType uint8
const (
    EntryValue EntryType = 0
    EntryTombstone EntryType = 1
)
type Entry struct {
    Type  EntryType
    Value []byte // empty if Tombstone
}
// C++
namespace dse::memtable {
    enum class EntryType : std::uint8_t { Value = 0, Tombstone = 1 };
    struct Entry {
        EntryType type;
        std::vector<std::uint8_t> value; // empty if Tombstone
    };
}

The type-byte numbering (0 = Value, 1 = Tombstone) is part of the on-disk contract — don't reorder it.

Container choice

#![allow(unused)]
fn main() {
// Rust — Vec<u8>'s Ord is byte-lex; BTreeMap iterates in key order
use std::collections::BTreeMap;
pub struct MemTable {
    map: BTreeMap<Vec<u8>, Entry>,
    bytes: usize,
}
}
// Go — unordered map; sort keys on iteration / encode
type MemTable struct {
    m     map[string]Entry
    bytes int
}

func (t *MemTable) sortedKeys() []string {
    keys := make([]string, 0, len(t.m))
    for k := range t.m {
        keys = append(keys, k)
    }
    sort.Strings(keys) // byte-lex on string is the same as on []byte
    return keys
}
// C++ — std::map's comparator is operator< on vector<uint8_t>, which is lex
class MemTable {
    std::map<std::vector<std::uint8_t>, Entry> map_;
    std::size_t bytes_ = 0;
};

put / delete

#![allow(unused)]
fn main() {
pub fn put(&mut self, key: &[u8], value: &[u8]) {
    self.bytes -= self.entry_bytes(key);
    self.map.insert(key.to_vec(), Entry::Value(value.to_vec()));
    self.bytes += self.entry_bytes(key);
}

pub fn delete(&mut self, key: &[u8]) {
    self.bytes -= self.entry_bytes(key);
    self.map.insert(key.to_vec(), Entry::Tombstone);
    self.bytes += self.entry_bytes(key);
}

fn entry_bytes(&self, key: &[u8]) -> usize {
    match self.map.get(key) {
        None => 0,
        Some(Entry::Value(v)) => 9 + key.len() + v.len(),
        Some(Entry::Tombstone) => 9 + key.len(),
    }
}
}

Go and C++ use the same accounting trick: subtract the old entry's contribution, update, add the new contribution.

iter

#![allow(unused)]
fn main() {
pub fn iter(&self) -> impl Iterator<Item = (&[u8], &Entry)> {
    self.map.iter().map(|(k, e)| (k.as_slice(), e))
}
}
func (t *MemTable) Iter() []KeyEntry {
    out := make([]KeyEntry, 0, len(t.m))
    for _, k := range t.sortedKeys() {
        out = append(out, KeyEntry{Key: []byte(k), Entry: t.m[k]})
    }
    return out
}
const std::map<std::vector<std::uint8_t>, Entry>& Iter() const noexcept {
    return map_;
}

Test — order determinism

Insert this permutation in each language and assert iteration yields the keys in the canonical sorted order:

inputs (insert order):  ["b", "a", "", "\x00\x00", "ab", "\x00"]
expected iter order:    ["", "\x00", "\x00\x00", "a", "ab", "b"]

This catches:

  • Go forgetting to sort.
  • C++ using std::map<std::string, ...> (where '\0' ends the string and breaks comparisons on binary keys).
  • Anyone using a hash map.

What to verify before moving on

  • put then get returns the value just written.
  • delete then get returns Tombstone (not absent).
  • Overwriting a key keeps len() at 1.
  • The permutation test above passes.
  • size_bytes() increases by exactly 9 + klen + vlen for each new key and stays flat when overwriting.