db-12 — Execution

What we built, in the order we built it.

1. Rust (src/rust) — the reference

  • Cargo.toml declares crate sqlfront12 (lib) and a binary sqlctl. No external dependencies, no path dependencies — the lab is self-contained.
  • src/lib.rs (~1100 lines) defines:
    • ParseError (one error type for both tokenize and parse phases).
    • TokKind + tokenize(src) -> Result<Vec<Token>, ParseError>. The tokenizer is a single character-by-character loop with branches for whitespace, -- line comment, identifier/keyword, integer literal, '...' string literal (with '' escape), comparison operator (=, !=, <, <=, >, >=), and single-char punctuation ((, ), ,, ;).
    • ColType { Int, Text }, Literal { Int(i64), Text(String) }, Op { Eq=1, Ne=2, Lt=3, Le=4, Gt=5, Ge=6 } (#[repr(u8)]), Where, SelectCols { Star, Named(Vec<String>) }, Statement enum with five variants.
    • Parser struct (token slice + cursor) with one method per non-terminal (parse_program, parse_stmt, parse_create, parse_insert, parse_select, parse_delete, parse_update, parse_where, parse_literal).
    • parse(src) -> Result<Vec<Statement>, ParseError> glues tokenize + Parser together.
    • serialize(stmts) -> Vec<u8> walks the AST and emits the canonical bytes described in CONCEPTS.md. Magic header b"DSESQL01" then u32 LE count, then per-statement records.
    • Inline sha256 + sha256_hex (FIPS 180-4) so the lab has no external crate dependencies.
  • 11 inline #[cfg(test)] tests:
    1. tokenize_happy — full coverage of all token kinds on a single mixed input.
    2. tokenize_strings_and_errors'' escape; unterminated string reports correct (line, col).
    3. parse_create_tableCREATE TABLE with INT + TEXT columns.
    4. parse_insert_multirow — multi-row VALUES, both literal types.
    5. parse_select_variants_and_all_opsSELECT *, SELECT col, col, each of the 6 comparison ops.
    6. parse_update_and_delete — UPDATE multi-SET + WHERE; DELETE + WHERE.
    7. parse_multi_with_comments_and_case-- line comments, case-insensitive keywords, identifier case preserved.
    8. parse_errors_report_column — missing identifier after SELECT reports line 1 col 8.
    9. serialize_header_and_count — magic bytes + count field correct.
    10. serialize_is_deterministic — two serialize calls on the same AST return equal bytes.
    11. sha256_known_vectors — the FIPS-180-4 SHA-256("") and ("abc") vectors.
  • bin/sqlctl.rs is the CLI used by the cross-language script.

2. Fixtures (scripts/fixtures)

Two SQL files, frozen forever (because the frozen hashes depend on every byte, including the trailing newline and the en-dash in the comment lines):

  • a_basic.sql — minimal smoke test. CREATE TABLE users, three-row INSERT, SELECT *, SELECT id, name WHERE id = 2. 181 bytes serialized.
  • b_full.sql — full-coverage. Every statement kind, both literal types, the '' escape, every comparison operator. 486 bytes serialized.

The hashes were computed once from the Rust reference and then frozen into the Go test, the C++ test, and scripts/cross_test.sh. If you edit either fixture, all three of those locations must update in the same commit.

3. Go (src/go)

  • go.mod module github.com/10xdev/dse/db12. No external deps, no replace directives — the module stands alone.
  • sql.go ports the Rust types one-for-one:
    • TokKind int constants.
    • Token, ColType (ColInt=1, ColText=2), LitKind, Literal, Op (OpEq=1..OpGe=6), Where, SelectColsKind, SelectCols, Column, Assign, StmtKind (KindCreate=1..KindUpdate=5).
    • One Statement struct holds the union (kind tag + every variant's fields). Go has no enums, so this is the idiomatic shape.
    • Tokenize, Parse, Serialize exported; an internal parser struct mirrors Rust's Parser.
  • sql_test.go mirrors all 11 Rust tests. Two of them — TestFixtureAHash and TestFixtureBHash — inline the exact fixture text and assert both the byte length and the frozen sha256. These two tests are what locks the wire format permanently.
  • cmd/sqlctl/main.go is the matching CLI.

Go matched Rust byte-for-byte on first run; no debugging needed.

4. C++ (src/cpp)

  • CMakeLists.txt — self-contained. Targets sqlfront12_lib, sqlctl, test_sqlfront12. No add_subdirectory because db-12 has no upstream dependencies; a comment in the file explains why not, so future-me doesn't try to "wire it up like db-09".

  • src/sqlfront12.h declares namespace sqlfront12: ParseError : std::runtime_error, TokKind, Token, the AST types, and entry points tokenize, parse, serialize, sha256_hex.

  • src/sqlfront12.cc (~500 lines) implements them. Anonymous-namespace Parser class; std::vector<std::uint8_t> buffers for the serializer; inline SHA-256 with a hex lookup table.

  • src/sqlctl.cc — the C++ CLI mirror. Writes bytes to stdout via std::cout.write(...), sha256 hex to stderr, catches ParseError and anything else, prints message, returns 1.

  • tests/test_sqlfront12.cc — 11 tests, mirroring Rust + Go. The first line is

    #undef NDEBUG
    #include <cassert>
    

    because Release builds otherwise no-op assert. Two of the tests inline the fixture content (including the en-dashes — UTF-8 in a C++ raw string literal) and assert the frozen hashes.

C++ matched Rust and Go on first build; ctest passed in ~0.2s.

5. Scripts (scripts/)

  • verify.shcargo test + go test + cmake/ctest. Prints === OK === and exits 0.

  • cross_test.sh — builds the three sqlctl binaries, runs each against both fixtures, asserts:

    • all three stderr-emitted sha256s match each other and the frozen value, for each fixture;
    • the CLI-emitted sha256 equals shasum -a 256 of the stdout bytes (catches "CLI lies about its own hash" bugs);
    • the byte streams are bit-identical (cmp -s);
    • an inline-arg smoke test (sqlctl parse --inline 'SELECT * FROM t;') matches across the three languages;
    • an error-path smoke test (SELECT FROM t;) returns non-zero in all three and the error string mentions the column.

    Prints === ALL OK === on success.

6. Bash 3.2 portability

macOS ships bash 3.2, which lacks declare -A (associative arrays). The first cut of cross_test.sh used declare -A WANT; WANT[a.sql]=...; want="${WANT[$fix]}", which ran fine under brew's bash 5.x and broke under /bin/bash. The fix is a plain function:

want_hash() {
    case "$1" in
        a_basic.sql) echo "071b40fd..." ;;
        b_full.sql)  echo "e219f1ee..." ;;
        *) echo ""; return 1 ;;
    esac
}
...
want="$(want_hash "$fix")"

Both scripts now run cleanly under /bin/bash (verified end-to-end).

What we deliberately didn't build

  • A bytecode VM. db-13.
  • A query planner. db-13/14.
  • Expressions richer than col OP literal. Future labs once we have a use for them.
  • Schema validation, name resolution, type checking. All planner jobs.
  • A pretty-printer / unparse function. Useful for round-trip fuzzing, irrelevant to the byte-identity proof.