Step 02 — Rule-Based Planner + Executor

Goal

Turn a Query { projections, predicates } into a Plan { Pipeline[scan, *Filter, Project?] }, then execute it.

Why

Once a table has indexes, every predicate has at least two physical implementations: scan the whole heap and filter, or jump into the index. Picking the right one is what a planner does. Even a 10-line rule-based planner outperforms "always SeqScan" by orders of magnitude on selective queries, and it teaches the same vocabulary (selectivity, access path, predicate pushdown) you need for cost-based work later.

What to do

  1. Estimate per predicate (return None if not indexable):

    • Op::Ne → not indexable.
    • No index on the column → not indexable.
    • Op::Eqrows / distinct_keys.
    • Op::Lt / Le / Gt / Ge(rows + 2) / 3.
  2. Pick the lowest estimate (strict <, so earlier predicate wins ties — determinism matters).

  3. Build the Plan:

    • First child: IndexScan{col, op, val} if a predicate was chosen, else SeqScan{table_id: 0}.
    • Remaining predicates become Filter nodes in input order.
    • If projections is non-empty, append Project{cols}sort and dedupe the column list so the wire format is canonical.
  4. Execute the Plan (Volcano-style, but materialise at each operator for simplicity):

    • Scan emits all matching rows in (key, row-id) order.
    • Filter retains rows where eval_pred(row, predicate) is true.
    • Project rewrites each row to the requested column subset.
  5. IndexScan for each op:

    • Eq → fetch the single bucket (or empty).
    • Lt → iterate range(..val).
    • Le → iterate range(..=val).
    • Gt → iterate range(val+ε..).
    • Ge → iterate range(val..).
    • Within each bucket, emit row-ids in ascending order.

Acceptance

  • Given an indexed column and an Op::Eq predicate, the planner emits IndexScan, and the executor returns the matching rows in row-id order.
  • Given a non-indexable predicate (Op::Ne, or no index), the planner emits SeqScan and a Filter.
  • Given two predicates, the planner picks the one with the lower estimate; the other becomes a Filter.
  • Projections with duplicates (e.g. [2, 0, 2]) end up as [0, 2].

Common pitfalls

  • Forgetting to clone the predicate value when moving it into IndexScan — both the chosen and discarded predicates need a copy.
  • Using <= instead of < for tie-breaking — only < keeps the choice deterministic when two predicates tie.
  • Returning rows from Le (<=) by stopping at the first key greater than val instead of strictly greater — off-by-one on bounds.
  • Mutating the input Query.projections instead of cloning before sort / dedup.