Step 2 — Append, sync, iterate
Goal
Implement Wal::open / append / sync / iter consistently in all three languages.
API recap
open(path) -> Wal // O_RDWR | O_CREAT, scan-and-truncate the tail
append(payload) -> u64 // returns the record's start offset
sync() -> () // fdatasync (or fsync on platforms without it)
len() -> u64 // bytes in the live valid region
iter(path) -> Iterator // yields each payload until first short/bad record
open — scan & truncate
The crucial subroutine. After a crash, the file may end in a partial header
or partial payload. open finds the last valid record's end and truncates
the file to that length, so subsequent appends append cleanly.
pos = 0
loop:
if file_size - pos < 8: break // not enough for header
read 8 bytes at pos: (len, crc)
if len == 0: break // EOF sentinel / sparse hole
if pos + 8 + len > file_size: break // payload short
read len bytes at pos+8
if crc32(payload) != crc: break
pos += 8 + len
if pos != file_size:
ftruncate(file, pos)
return Wal { fd, write_offset = pos }
append
hdr[0..4] = len.to_le_bytes()
hdr[4..8] = crc32(payload).to_le_bytes()
pwrite(fd, hdr, write_offset)
pwrite(fd, payload, write_offset + 8)
offset_returned = write_offset
write_offset += 8 + len
return offset_returned
We do not fsync inside append. Callers do that explicitly via sync() to enable group commit.
sync
- Linux:
fdatasync(fd) - macOS:
fcntl(fd, F_FULLFSYNC, 0)for true device-level sync; falls back tofsync(fd)if F_FULLFSYNC fails (e.g., not on APFS). - Windows:
FlushFileBuffers(handle)(out of scope here).
In this lab we use fdatasync (Linux) and fsync (macOS) for simplicity; production should consider F_FULLFSYNC on macOS because plain fsync does not guarantee device-level durability on Apple's filesystems.
iter — read-only replay
Mirrors open's scan loop but yields each payload instead of advancing a write cursor. Stops on the same conditions (len == 0, short header, short payload, bad CRC). Never panics on garbage.
Tests to pin behavior
| # | Test | Expected |
|---|---|---|
| T1 | Append "A", "B", reopen, iter → ["A", "B"] | Both records returned in order |
| T2 | Append, truncate WAL by 1 byte (cut payload), reopen, iter | Last record dropped |
| T3 | Append, flip a payload byte, iter | Reader stops at bad CRC |
| T4 | Append, write \0\0\0\0\0\0\0\0 past EOF, reopen | File length restored to pre-garbage size |
| T5 | append() returned offsets are strictly increasing and equal to file size after that append | Yes |
Gotchas
- macOS
fsyncdoes not flush the disk write cache. UseF_FULLFSYNCfor tests that must outlive a power loss. - Rust
File::write_alldoes not callflushon the kernel level, only the userspaceBufWriter. We use rawpwritevianix/std::os::unix::fs::FileExt::write_all_atto skip the userspace buffer entirely. - Go
os.File.Writeis unbuffered by default, butbufio.Writeris not. Make sure yourWaldoes not wrap the file in abufio.Writer— that defers writes invisibly and confusessync.