Indexes and Query Optimization
1. What Is It
A secondary index is an auxiliary data structure that maps each value of a
non-primary-key column to the set of row-ids that contain it. A query planner
turns a logical query (predicates + projections) into a physical plan tree
(scan → filters → project) and picks an access path per predicate. A
rule-based planner uses fixed heuristics; a cost-based planner consults
statistics. db-14 implements the rule-based half end-to-end and keeps the
cost model deliberately tiny (rows / distinct_keys for =, (rows+2)/3
for ranges) so the byte-for-byte cross-language invariant is tractable.
2. Why It Matters
A SeqScan on N rows costs Θ(N) regardless of selectivity. A point lookup through a sorted (or hashed) index is O(log N) (or O(1) amortised). When predicates have selectivity ≪ 1 — the normal case in OLTP — choosing the right access path is the single largest performance lever a database has. And once two physical operators are available, you need a planner. Even a naive planner with the wrong cost model can be catastrophically slow on real workloads (see Leis et al., "How Good Are Query Optimizers, Really?", PVLDB 2015) — but it is also the concept through which every later optimisation (joins, partitioning, parallelism) gets expressed.
3. How It Works
Query{projections, predicates}
│
▼
┌───────────────┐
│ Planner │ rule-based:
│ estimate per │ • Eq → rows/distinct
│ indexable pred│ • Lt/Le/Gt/Ge → (rows+2)/3
│ pick min │ • Ne / no-index → SeqScan
└──────┬────────┘
▼
Plan{ Pipeline[ scan, *Filter, Project? ] }
│
▼
┌───────────────┐
│ Executor │ Volcano-style: scan rows,
│ scan→filter │ retain on predicate,
│ →project │ rewrite columns at end.
└──────┬────────┘
▼
[]Row
Indexes are BTreeMap<Value, Vec<row_id>> in Rust, std::map in C++, and a
sorted slice of (Value, []row_id) in Go. Insertion order is preserved
inside each bucket, which (combined with ascending key traversal) gives a
total, deterministic output order shared across all three implementations.
4. Core Terminology
| Term | Definition |
|---|---|
| Secondary index | Sorted map from column value → list of row-ids. |
| Access path | Concrete way to read rows for a predicate (SeqScan vs IndexScan). |
| Selectivity | Fraction of rows a predicate keeps; 0 ≤ s ≤ 1. |
| Cardinality estimate | Predicted row count out of an operator. |
| Pipeline | Linear chain of operators evaluated row-at-a-time (Volcano model). |
| Rule-based optimizer | Picks plans from fixed heuristics; no statistics. |
| Cost-based optimizer | Searches plan space; uses statistics + a cost function. |
| Covering index | Index that includes every column the query needs (skip the row lookup). |
| Index-only scan | Read the index without touching the heap. |
| Tuple | A single row of a relation. |
5. Mental Models
- Index = sorted dictionary. Equality is dictionary lookup; range is
dictionary
range(). Everything else is a special case. - Planner = predicate auctioneer. Each indexable predicate "bids" its estimated row count; the lowest bid wins the scan, the rest become Filters.
- Executor = pipeline. Each operator pulls rows from its child. Operators don't materialise unless they must (project, sort, hash).
- Wire format = correctness oracle. If three languages serialise the same
plan and result bytes for the same query, they agree on planning + execution
semantics. The SHA-256 collapses N MB of bytes into a 64-char string we
can put in a
casestatement.
6. Common Misconceptions
- "Indexes always speed up reads." False for low-selectivity predicates: a SeqScan reads the heap once; an IndexScan dereferences each row-id, which may be worse if most rows match.
- "More indexes is free." Every write must update every index — and indexes cost RAM/disk.
- "Rule-based is obsolete." Modern systems (SQLite, MySQL) ship rule-based fallbacks for simple queries because cost-based planning has its own pathologies (bad stats → catastrophic plans).
- "Selectivity = 1 / distinct keys." Only under uniform-distribution assumption. Skewed data needs histograms or sampling.
- "Project is free." Wide projection through long pipelines materialises copies; columnar engines avoid this; row engines pay for it.
7. Interview Talking Points
- Explain how a B-tree index supports both point and range queries, and what changes for a hash index (point only, no ordering).
- Walk through plan selection: predicates → estimates → cheapest scan → remaining as filters → optional project.
- Why is index iteration order important? Determinism, merge-join inputs, ORDER BY elimination.
- Explain Volcano vs vectorised execution. Tradeoffs?
- What is a covering index, and when does it dominate?
- Discuss
EXPLAINoutput: how do you read a query plan? - What can go wrong when the planner picks a SeqScan instead of an IndexScan (or vice versa)? Stale statistics, correlated predicates, type coercions that disable the index.
8. Connections to Other Labs
- db-02 (data structures) introduced sorted maps; this lab puts them to work.
- db-10 (B-tree) is the persistent counterpart of the in-memory index here.
- db-12 (SQL frontend) produces the
Querystruct planners consume. - db-13 (transactions/MVCC) governs which rows the index sees per snapshot.
- db-15 (SQLite-complete) stitches all of the above into a real engine.
- db-22 (perf/benchmarking) measures the planner choices we make here.
9. Frozen Wire Format
Plan stream = 0x05 (PIPELINE) | u32 LE child_count | child*
Child node tags:
0x01 SeqScan | u32 LE table_id(=0)
0x02 IndexScan | u32 LE col_idx | u8 op | u8 val_tag | <val>
0x03 Filter | u32 LE col_idx | u8 op | u8 val_tag | <val>
0x04 Project | u32 LE col_count | (u32 LE col_idx)*
Op codes : Eq=1 Ne=2 Lt=3 Le=4 Gt=5 Ge=6
Val tags : Int=1 (i64 LE) ; Text=2 (u32 LE len | bytes)
Result stream = "DSEQR01" (7 bytes) | u32 LE row_count |
per row: u32 LE col_count | (u8 tag | <val>)*
Both streams are concatenated per op; the SHA-256 of that concatenation, across N ops, is the byte-identity oracle for the cross-language test.