Step 02 — Rule-Based Planner + Executor

Goal

Turn a Query { projections, predicates } into a Plan { Pipeline[scan, *Filter, Project?] }, then execute it.

Once a table has indexes, every predicate has at least two physical implementations: scan the whole heap and filter, or jump into the index. Picking the right one is what a planner does. Even a 10-line rule-based planner outperforms "always SeqScan" by orders of magnitude on selective queries, and it teaches the same vocabulary (selectivity, access path, predicate pushdown) you need for cost-based work later.

What to do

Estimate per predicate (return None if not indexable):
- Op::Ne → not indexable.
- No index on the column → not indexable.
- Op::Eq → rows / distinct_keys.
- Op::Lt / Le / Gt / Ge → (rows + 2) / 3.
Pick the lowest estimate (strict <, so earlier predicate wins ties — determinism matters).
Build the Plan:
- First child: IndexScan{col, op, val} if a predicate was chosen, else SeqScan{table_id: 0}.
- Remaining predicates become Filter nodes in input order.
- If projections is non-empty, append Project{cols} — sort and dedupe the column list so the wire format is canonical.
Execute the Plan (Volcano-style, but materialise at each operator for simplicity):
- Scan emits all matching rows in (key, row-id) order.
- Filter retains rows where eval_pred(row, predicate) is true.
- Project rewrites each row to the requested column subset.
IndexScan for each op:
- Eq → fetch the single bucket (or empty).
- Lt → iterate range(..val).
- Le → iterate range(..=val).
- Gt → iterate range(val+ε..).
- Ge → iterate range(val..).
- Within each bucket, emit row-ids in ascending order.

Acceptance

Given an indexed column and an Op::Eq predicate, the planner emits IndexScan, and the executor returns the matching rows in row-id order.
Given a non-indexable predicate (Op::Ne, or no index), the planner emits SeqScan and a Filter.
Given two predicates, the planner picks the one with the lower estimate; the other becomes a Filter.
Projections with duplicates (e.g. [2, 0, 2]) end up as [0, 2].

Common pitfalls

Forgetting to clone the predicate value when moving it into IndexScan — both the chosen and discarded predicates need a copy.
Using <= instead of < for tie-breaking — only < keeps the choice deterministic when two predicates tie.
Returning rows from Le (<=) by stopping at the first key greater than val instead of strictly greater — off-by-one on bounds.
Mutating the input Query.projections instead of cloning before sort / dedup.

Distributed Systems Engineer — Build Databases & Consensus From Scratch

Step 02 — Rule-Based Planner + Executor

Goal

Why

What to do

Acceptance

Common pitfalls