db-10 step 01 — Tree shape and get / scan

Goal

Define the in-memory B-tree's node representation and implement the two read-only operations: point lookup get(k) and ordered scan() -> Vec<(k,v)>. No mutation yet; this step is about pinning the data structure.

Tasks

Declare constants T = 2, MIN_KEYS = T - 1, MAX_KEYS = 2T - 1.
Define a Node containing:
- is_leaf: bool
- keys: Vec<(Vec<u8>, Vec<u8>)> — sorted by key
- children: Vec<Box<Node>> — empty for leaves; for internal nodes, children.len() == keys.len() + 1
Define BTree { root: Box<Node> }. new() produces an empty leaf root.
Implement get(&self, key: &[u8]) -> Option<Vec<u8>> by descending from the root: at each node, find the first key >= target; if equal, return its value; if leaf, return None; else recurse into children[i].
Implement scan(&self) -> Vec<(Vec<u8>, Vec<u8>)> as the standard in-order traversal: for each i in 0..keys.len(), recurse into children[i], push keys[i]; finally recurse into the last child.
Implement len() and is_empty() as helpers.

Acceptance

Inline unit tests:

get_on_empty_returns_none — BTree::new().get(b"k") == None.
manual_build_get_returns_value — manually construct a 3-key leaf, get returns the right value for each key and None for misses.
scan_is_sorted — manually construct an internal node with two leaf children; scan() returns the merged sorted sequence.

All three green in Rust, Go, and C++.

Discussion prompts

Why does get use linear scan over keys rather than binary search? For T = 2 the answer is obvious; for T = 256 is it still?
Why is is_leaf stored on each node rather than inferred from children.is_empty()?
What goes wrong if scan recurses into the last child before pushing the last key?

Distributed Systems Engineer — Build Databases & Consensus From Scratch

db-10 step 01 — Tree shape and get / scan

Goal

Tasks

Acceptance

Discussion prompts