Analysis — Write-Ahead Log

Problem statement

Make a stream of writes durable and crash-recoverable without paying for a random in-place fsync per write.

Constraint	Why it matters
Append-only file	Sequential I/O, ~10–100× faster than random on HDD; better SLC/GC behavior on SSD.
Self-describing records	Recovery must work without a side index. Header = length + checksum.
Truncation-tolerant tail	Crash mid-write leaves a partial record. We must detect & ignore it on next open.
Single writer	We do not address multi-writer log multiplexing here (Kafka does).
No structural guarantees from the FS	Don't assume ordering of metadata vs data, or that 4KB writes are atomic.

Header = len(u32 LE) || crc32(u32 LE). Small (8 B), aligned, endian-fixed. We deliberately keep the CRC out of the length so we can stream-checksum the body.
CRC is over the payload only. The header itself is implicitly validated by use — a corrupt len either points beyond EOF (short read) or to data whose CRC won't match.
len == 0 is disallowed, used as the EOF sentinel. Empty payloads are rare in practice and avoiding the ambiguity simplifies the reader (len=0,crc=0 happens naturally in a hole of zeros from a sparse file or pre-allocated extent).
Little-endian on disk. Everyone runs LE now (x86, ARM, RISC-V — even POWER prefers it). No htole32 dance saves ~5 LOC per language.
CRC table generated at startup, not hardcoded. 1 KB, computed in microseconds. Easier to audit, and lets us swap polynomials in tests.
One file, one writer, one fd. No segment rotation in this lab — that lives in db-07 (compaction) and db-09 (LevelDB). Single-file WAL is enough to teach framing.
sync() is a separate method. The caller decides commit boundaries. Production systems may add append_sync(payload) that batches a group commit; we leave that for bench mode.

vs LevelDB's block-grouped framing: LevelDB pads records to 32KB blocks for alignment and easier corruption isolation. Beautiful, but doubles the code volume. We follow this lab's bias of "minimum to teach the concept, plus one cross-language cross-check".
vs JSON / protobuf framing: would require schema management. CRC + raw bytes is the smallest possible recoverable framing.
vs a per-record fsync: we expose a separate sync() so the user can choose between durability per-record (call after every append) and group commit (call periodically).

Failure	Detection
Partial header at EOF	Header read < 8 bytes ⇒ stop iteration.
Header OK, partial payload	Payload read < len ⇒ stop iteration.
Full record, CRC mismatch (bit-flip)	CRC32 over payload ≠ stored CRC ⇒ stop iteration.
Hole of zeros (sparse FS, preallocated)	`len == 0` is the EOF sentinel ⇒ stop iteration cleanly.
Disk fully lying about `fsync`	Out of scope; mention `fio --fdatasync=1` to detect.

Bit-flip in the header itself that produces a plausible (len, crc) pair (probability ≈ 2⁻³²). Production systems mitigate with a record-type byte (LevelDB) or magic bytes (Kafka).
Multi-process writers. Use O_APPEND + ≤PIPE_BUF append for that; see db-09 / db-21.
Disk full mid-write. Treat as torn write at EOF (the trailing record fails CRC and is dropped on recovery); the caller's append() returns an I/O error that they must handle.