db-12 — References

Primary sources

Crafting Interpreters, Robert Nystrom — chapters 4 ("Scanning") and 6 ("Parsing Expressions") map almost 1:1 onto what we built. The hand- rolled recursive-descent style and the "one function per non-terminal" discipline are taken straight from this book. https://craftinginterpreters.com/
Modern Compiler Implementation in ML (or in C / Java), Andrew Appel — chapter 3 ("Parsing"). The clearest exposition of why recursive descent works for LL(1) grammars and what changes when you need lookahead or precedence climbing.
The Dragon Book (Aho, Lam, Sethi, Ullman) — chapters 3 and 4. The textbook source for lexical analysis (DFA construction, regular expressions to scanners) and predictive parsing. Read for theory; the practice is in Crafting Interpreters.

How real databases parse SQL

SQLite — src/tokenize.c (hand-rolled DFA, conceptually the same as ours but written in C with a generated keyword-lookup function) and src/parse.y (a Lemon-parser grammar, which generates a table-driven bottom-up parser rather than our recursive-descent). https://github.com/sqlite/sqlite/blob/master/src/tokenize.c https://www.sqlite.org/lemon.html
PostgreSQL — src/backend/parser/scan.l (flex) and src/backend/parser/gram.y (bison). A generator-based front; the grammar file alone is ~17k lines. Worth opening just to see the scale of the dialect they support. https://github.com/postgres/postgres/tree/master/src/backend/parser
DuckDB — uses libpg_query, which is a stripped-down Postgres parser exposed as a library. A useful pattern when you want Postgres-compatible SQL without the rest of Postgres. https://github.com/duckdb/duckdb/tree/main/third_party/libpg_query
CockroachDB — pkg/sql/parser/sql.y (goyacc). Like DuckDB above, another example of "real database, generated parser".

Recursive descent specifically

Rob Pike, Lexical Scanning in Go, GopherCon 2011 — the talk that popularized the "scanner emits tokens to a channel; parser reads from the channel" style. We don't use channels (we just return a Vec), but the state-machine framing of the scanner loop is the same. https://www.youtube.com/watch?v=HxaD_trXwRE
Doug Crockford, Top Down Operator Precedence, 2007 — the cleanest explanation of Pratt parsing, which is what you reach for next once recursive descent runs into expression-precedence pain. We don't need it here (no expression grammar) but it's the natural follow-up. https://crockford.com/javascript/tdop/tdop.html

Determinism / wire formats

Federal Information Processing Standards Publication 180-4, NIST, 2015 — the SHA-256 specification we implemented inline in Rust and C++. https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf
Protobuf encoding docs — useful counterexample for what not to do if you want byte-identity. Protobuf's "unknown field" handling and optional canonicalization are exactly the corners that prevent stable hashes across implementations. Our format avoids those corners on purpose. https://protobuf.dev/programming-guides/encoding/

None. db-12 is intentionally self-contained: there is nothing in earlier labs (storage, WAL, LSM, B-tree) that the parser needs, and the AST serializer uses no upstream wire format. The C++ build does not add_subdirectory(../db-NN). That isolation is a feature, not an oversight — it keeps the lab small enough to be reasoned about as a self-contained compiler-front exercise.

db-12 feeds db-13 (execution + transactions), where the AST will finally be walked by a VM.

Distributed Systems Engineer — Build Databases & Consensus From Scratch

db-12 — References

Primary sources

How real databases parse SQL

Recursive descent specifically

Determinism / wire formats

Cross-lab dependencies