db-10 step 01 — Tree shape and get / scan

Goal

Define the in-memory B-tree's node representation and implement the two read-only operations: point lookup get(k) and ordered scan() -> Vec<(k,v)>. No mutation yet; this step is about pinning the data structure.

Tasks

  1. Declare constants T = 2, MIN_KEYS = T - 1, MAX_KEYS = 2T - 1.
  2. Define a Node containing:
    • is_leaf: bool
    • keys: Vec<(Vec<u8>, Vec<u8>)> — sorted by key
    • children: Vec<Box<Node>> — empty for leaves; for internal nodes, children.len() == keys.len() + 1
  3. Define BTree { root: Box<Node> }. new() produces an empty leaf root.
  4. Implement get(&self, key: &[u8]) -> Option<Vec<u8>> by descending from the root: at each node, find the first key >= target; if equal, return its value; if leaf, return None; else recurse into children[i].
  5. Implement scan(&self) -> Vec<(Vec<u8>, Vec<u8>)> as the standard in-order traversal: for each i in 0..keys.len(), recurse into children[i], push keys[i]; finally recurse into the last child.
  6. Implement len() and is_empty() as helpers.

Acceptance

Inline unit tests:

  • get_on_empty_returns_noneBTree::new().get(b"k") == None.
  • manual_build_get_returns_value — manually construct a 3-key leaf, get returns the right value for each key and None for misses.
  • scan_is_sorted — manually construct an internal node with two leaf children; scan() returns the merged sorted sequence.

All three green in Rust, Go, and C++.

Discussion prompts

  • Why does get use linear scan over keys rather than binary search? For T = 2 the answer is obvious; for T = 256 is it still?
  • Why is is_leaf stored on each node rather than inferred from children.is_empty()?
  • What goes wrong if scan recurses into the last child before pushing the last key?