Step 01 — Secondary Index

Goal

Build the Table structure: rows, schema, and a BTreeMap-equivalent secondary index per indexed column.

Why

A secondary index is the smallest, most universal unit of query optimisation. Once you can map column-value → row-ids in sorted order, equality and range predicates become dictionary operations, and the planner has a real choice to make.

What to do

  1. Pick a tagged-union Value type: Int(i64) or Text(bytes). Implement a frozen total order: Int < Text across kinds; natural order within each kind. This is the byte-identity contract.

  2. Define Column { name, type } and Row = Vec<Value>.

  3. Implement Table::new(schema), Table::insert(row), Table::create_index(col_idx). The index is keyed by Value and maps to an ascending Vec<row_id>.

  4. Decide the physical form per language:

    • Rust: BTreeMap<Value, Vec<u32>>.
    • C++ : std::map<Value, std::vector<std::uint32_t>>.
    • Go : sorted slice of (Value, []uint32) + binary search. Never iterate a Go map for cross-language tests — the order is randomised.
  5. insert must update every existing index before moving the row. create_index must iterate rows in order (so each bucket's row-id list is naturally ascending without per-bucket sort).

Acceptance

  • Inserting the same rows in the same order produces the same index contents byte-for-byte across languages.
  • An EQ lookup on an indexed column returns rows in ascending row-id order.
  • A range lookup walks the index in ascending key, ascending row-id within each bucket.

Common pitfalls

  • Storing the wrong column in the bucket (off-by-one on col_idx).
  • Copying the Value lazily and then losing the bytes when the row moves — clone before insert.
  • Letting Go's range over map leak into any code path that touches the index — every iteration must be over the sorted slice.