Step 01 — Secondary Index
Goal
Build the Table structure: rows, schema, and a BTreeMap-equivalent
secondary index per indexed column.
Why
A secondary index is the smallest, most universal unit of query optimisation. Once you can map column-value → row-ids in sorted order, equality and range predicates become dictionary operations, and the planner has a real choice to make.
What to do
-
Pick a tagged-union
Valuetype:Int(i64)orText(bytes). Implement a frozen total order:Int < Textacross kinds; natural order within each kind. This is the byte-identity contract. -
Define
Column { name, type }andRow = Vec<Value>. -
Implement
Table::new(schema),Table::insert(row),Table::create_index(col_idx). The index is keyed byValueand maps to an ascendingVec<row_id>. -
Decide the physical form per language:
- Rust:
BTreeMap<Value, Vec<u32>>. - C++ :
std::map<Value, std::vector<std::uint32_t>>. - Go : sorted slice of
(Value, []uint32)+ binary search. Never iterate a Gomapfor cross-language tests — the order is randomised.
- Rust:
-
insertmust update every existing index before moving the row.create_indexmust iterate rows in order (so each bucket's row-id list is naturally ascending without per-bucket sort).
Acceptance
- Inserting the same rows in the same order produces the same index contents byte-for-byte across languages.
- An
EQlookup on an indexed column returns rows in ascending row-id order. - A range lookup walks the index in ascending key, ascending row-id within each bucket.
Common pitfalls
- Storing the wrong column in the bucket (off-by-one on
col_idx). - Copying the
Valuelazily and then losing the bytes when the row moves — clone before insert. - Letting Go's
range over mapleak into any code path that touches the index — every iteration must be over the sorted slice.