db-12 — Execution
What we built, in the order we built it.
1. Rust (src/rust) — the reference
Cargo.tomldeclares cratesqlfront12(lib) and a binarysqlctl. No external dependencies, no path dependencies — the lab is self-contained.src/lib.rs(~1100 lines) defines:ParseError(one error type for both tokenize and parse phases).TokKind+tokenize(src) -> Result<Vec<Token>, ParseError>. The tokenizer is a single character-by-character loop with branches for whitespace,--line comment, identifier/keyword, integer literal,'...'string literal (with''escape), comparison operator (=,!=,<,<=,>,>=), and single-char punctuation ((,),,,;).ColType { Int, Text },Literal { Int(i64), Text(String) },Op { Eq=1, Ne=2, Lt=3, Le=4, Gt=5, Ge=6 }(#[repr(u8)]),Where,SelectCols { Star, Named(Vec<String>) },Statementenum with five variants.Parserstruct (token slice + cursor) with one method per non-terminal (parse_program,parse_stmt,parse_create,parse_insert,parse_select,parse_delete,parse_update,parse_where,parse_literal).parse(src) -> Result<Vec<Statement>, ParseError>glues tokenize + Parser together.serialize(stmts) -> Vec<u8>walks the AST and emits the canonical bytes described in CONCEPTS.md. Magic headerb"DSESQL01"thenu32 LEcount, then per-statement records.- Inline
sha256+sha256_hex(FIPS 180-4) so the lab has no external crate dependencies.
- 11 inline
#[cfg(test)]tests:tokenize_happy— full coverage of all token kinds on a single mixed input.tokenize_strings_and_errors—''escape; unterminated string reports correct(line, col).parse_create_table—CREATE TABLEwith INT + TEXT columns.parse_insert_multirow— multi-row VALUES, both literal types.parse_select_variants_and_all_ops—SELECT *,SELECT col, col, each of the 6 comparison ops.parse_update_and_delete— UPDATE multi-SET + WHERE; DELETE + WHERE.parse_multi_with_comments_and_case—--line comments, case-insensitive keywords, identifier case preserved.parse_errors_report_column— missing identifier afterSELECTreportsline 1 col 8.serialize_header_and_count— magic bytes + count field correct.serialize_is_deterministic— twoserializecalls on the same AST return equal bytes.sha256_known_vectors— the FIPS-180-4 SHA-256("") and ("abc") vectors.
bin/sqlctl.rsis the CLI used by the cross-language script.
2. Fixtures (scripts/fixtures)
Two SQL files, frozen forever (because the frozen hashes depend on every
byte, including the trailing newline and the en-dash — in the comment
lines):
a_basic.sql— minimal smoke test.CREATE TABLE users, three-rowINSERT,SELECT *,SELECT id, name WHERE id = 2. 181 bytes serialized.b_full.sql— full-coverage. Every statement kind, both literal types, the''escape, every comparison operator. 486 bytes serialized.
The hashes were computed once from the Rust reference and then frozen
into the Go test, the C++ test, and scripts/cross_test.sh. If you
edit either fixture, all three of those locations must update in the
same commit.
3. Go (src/go)
go.modmodulegithub.com/10xdev/dse/db12. No external deps, noreplacedirectives — the module stands alone.sql.goports the Rust types one-for-one:TokKindint constants.Token,ColType(ColInt=1,ColText=2),LitKind,Literal,Op(OpEq=1..OpGe=6),Where,SelectColsKind,SelectCols,Column,Assign,StmtKind(KindCreate=1..KindUpdate=5).- One
Statementstruct holds the union (kind tag + every variant's fields). Go has no enums, so this is the idiomatic shape. Tokenize,Parse,Serializeexported; an internalparserstruct mirrors Rust'sParser.
sql_test.gomirrors all 11 Rust tests. Two of them —TestFixtureAHashandTestFixtureBHash— inline the exact fixture text and assert both the byte length and the frozen sha256. These two tests are what locks the wire format permanently.cmd/sqlctl/main.gois the matching CLI.
Go matched Rust byte-for-byte on first run; no debugging needed.
4. C++ (src/cpp)
-
CMakeLists.txt— self-contained. Targetssqlfront12_lib,sqlctl,test_sqlfront12. Noadd_subdirectorybecause db-12 has no upstream dependencies; a comment in the file explains why not, so future-me doesn't try to "wire it up like db-09". -
src/sqlfront12.hdeclaresnamespace sqlfront12:ParseError : std::runtime_error,TokKind,Token, the AST types, and entry pointstokenize,parse,serialize,sha256_hex. -
src/sqlfront12.cc(~500 lines) implements them. Anonymous-namespaceParserclass;std::vector<std::uint8_t>buffers for the serializer; inline SHA-256 with a hex lookup table. -
src/sqlctl.cc— the C++ CLI mirror. Writes bytes to stdout viastd::cout.write(...), sha256 hex to stderr, catchesParseErrorand anything else, prints message, returns 1. -
tests/test_sqlfront12.cc— 11 tests, mirroring Rust + Go. The first line is#undef NDEBUG #include <cassert>because Release builds otherwise no-op
assert. Two of the tests inline the fixture content (including the—en-dashes — UTF-8 in a C++ raw string literal) and assert the frozen hashes.
C++ matched Rust and Go on first build; ctest passed in ~0.2s.
5. Scripts (scripts/)
-
verify.sh—cargo test+go test+cmake/ctest. Prints=== OK ===and exits 0. -
cross_test.sh— builds the threesqlctlbinaries, runs each against both fixtures, asserts:- all three stderr-emitted sha256s match each other and the frozen value, for each fixture;
- the CLI-emitted sha256 equals
shasum -a 256of the stdout bytes (catches "CLI lies about its own hash" bugs); - the byte streams are bit-identical (
cmp -s); - an inline-arg smoke test (
sqlctl parse --inline 'SELECT * FROM t;') matches across the three languages; - an error-path smoke test (
SELECT FROM t;) returns non-zero in all three and the error string mentions the column.
Prints
=== ALL OK ===on success.
6. Bash 3.2 portability
macOS ships bash 3.2, which lacks declare -A (associative arrays).
The first cut of cross_test.sh used declare -A WANT; WANT[a.sql]=...; want="${WANT[$fix]}", which ran fine under brew's bash 5.x and broke
under /bin/bash. The fix is a plain function:
want_hash() {
case "$1" in
a_basic.sql) echo "071b40fd..." ;;
b_full.sql) echo "e219f1ee..." ;;
*) echo ""; return 1 ;;
esac
}
...
want="$(want_hash "$fix")"
Both scripts now run cleanly under /bin/bash (verified end-to-end).
What we deliberately didn't build
- A bytecode VM. db-13.
- A query planner. db-13/14.
- Expressions richer than
col OP literal. Future labs once we have a use for them. - Schema validation, name resolution, type checking. All planner jobs.
- A pretty-printer /
unparsefunction. Useful for round-trip fuzzing, irrelevant to the byte-identity proof.