Step 01 — Secondary Index

Goal

Build the Table structure: rows, schema, and a BTreeMap-equivalent secondary index per indexed column.

A secondary index is the smallest, most universal unit of query optimisation. Once you can map column-value → row-ids in sorted order, equality and range predicates become dictionary operations, and the planner has a real choice to make.

What to do

Pick a tagged-union Value type: Int(i64) or Text(bytes). Implement a frozen total order: Int < Text across kinds; natural order within each kind. This is the byte-identity contract.
Define Column { name, type } and Row = Vec<Value>.
Implement Table::new(schema), Table::insert(row), Table::create_index(col_idx). The index is keyed by Value and maps to an ascending Vec<row_id>.
Decide the physical form per language:
- Rust: BTreeMap<Value, Vec<u32>>.
- C++ : std::map<Value, std::vector<std::uint32_t>>.
- Go : sorted slice of (Value, []uint32) + binary search. Never iterate a Go map for cross-language tests — the order is randomised.
insert must update every existing index before moving the row. create_index must iterate rows in order (so each bucket's row-id list is naturally ascending without per-bucket sort).

Acceptance

Inserting the same rows in the same order produces the same index contents byte-for-byte across languages.
An EQ lookup on an indexed column returns rows in ascending row-id order.
A range lookup walks the index in ascending key, ascending row-id within each bucket.

Common pitfalls

Storing the wrong column in the bucket (off-by-one on col_idx).
Copying the Value lazily and then losing the bytes when the row moves — clone before insert.
Letting Go's range over map leak into any code path that touches the index — every iteration must be over the sorted slice.

Distributed Systems Engineer — Build Databases & Consensus From Scratch

Step 01 — Secondary Index

Goal

Why

What to do

Acceptance

Common pitfalls