Header = len(u32 LE) || crc32(u32 LE). Small (8 B), aligned, endian-fixed. We deliberately keep the CRC out of the length so we can stream-checksum the body.
CRC is over the payload only. The header itself is implicitly validated by use — a corrupt len either points beyond EOF (short read) or to data whose CRC won't match.
len == 0 is disallowed, used as the EOF sentinel. Empty payloads are rare in practice and avoiding the ambiguity simplifies the reader (len=0,crc=0 happens naturally in a hole of zeros from a sparse file or pre-allocated extent).
Little-endian on disk. Everyone runs LE now (x86, ARM, RISC-V — even POWER prefers it). No htole32 dance saves ~5 LOC per language.
CRC table generated at startup, not hardcoded. 1 KB, computed in microseconds. Easier to audit, and lets us swap polynomials in tests.
One file, one writer, one fd. No segment rotation in this lab — that lives in db-07 (compaction) and db-09 (LevelDB). Single-file WAL is enough to teach framing.
sync() is a separate method. The caller decides commit boundaries. Production systems may add append_sync(payload) that batches a group commit; we leave that for bench mode.
vs LevelDB's block-grouped framing: LevelDB pads records to 32KB blocks for alignment and easier corruption isolation. Beautiful, but doubles the code volume. We follow this lab's bias of "minimum to teach the concept, plus one cross-language cross-check".
vs JSON / protobuf framing: would require schema management. CRC + raw bytes is the smallest possible recoverable framing.
vs a per-record fsync: we expose a separate sync() so the user can choose between durability per-record (call after every append) and group commit (call periodically).
Bit-flip in the header itself that produces a plausible (len, crc) pair (probability ≈ 2⁻³²). Production systems mitigate with a record-type byte (LevelDB) or magic bytes (Kafka).
Multi-process writers. Use O_APPEND + ≤PIPE_BUF append for that; see db-09 / db-21.
Disk full mid-write. Treat as torn write at EOF (the trailing record fails CRC and is dropped on recovery); the caller's append() returns an I/O error that they must handle.