Step 2 — Append, sync, iterate

Goal

Implement Wal::open / append / sync / iter consistently in all three languages.

API recap

open(path)        -> Wal      // O_RDWR | O_CREAT, scan-and-truncate the tail
append(payload)  -> u64       // returns the record's start offset
sync()           -> ()        // fdatasync (or fsync on platforms without it)
len()            -> u64       // bytes in the live valid region
iter(path)       -> Iterator  // yields each payload until first short/bad record

open — scan & truncate

The crucial subroutine. After a crash, the file may end in a partial header or partial payload. open finds the last valid record's end and truncates the file to that length, so subsequent appends append cleanly.

pos = 0
loop:
    if file_size - pos < 8: break              // not enough for header
    read 8 bytes at pos: (len, crc)
    if len == 0: break                          // EOF sentinel / sparse hole
    if pos + 8 + len > file_size: break         // payload short
    read len bytes at pos+8
    if crc32(payload) != crc: break
    pos += 8 + len
if pos != file_size:
    ftruncate(file, pos)
return Wal { fd, write_offset = pos }

append

hdr[0..4] = len.to_le_bytes()
hdr[4..8] = crc32(payload).to_le_bytes()
pwrite(fd, hdr,     write_offset)
pwrite(fd, payload, write_offset + 8)
offset_returned = write_offset
write_offset += 8 + len
return offset_returned

We do not fsync inside append. Callers do that explicitly via sync() to enable group commit.

sync

  • Linux: fdatasync(fd)
  • macOS: fcntl(fd, F_FULLFSYNC, 0) for true device-level sync; falls back to fsync(fd) if F_FULLFSYNC fails (e.g., not on APFS).
  • Windows: FlushFileBuffers(handle) (out of scope here).

In this lab we use fdatasync (Linux) and fsync (macOS) for simplicity; production should consider F_FULLFSYNC on macOS because plain fsync does not guarantee device-level durability on Apple's filesystems.

iter — read-only replay

Mirrors open's scan loop but yields each payload instead of advancing a write cursor. Stops on the same conditions (len == 0, short header, short payload, bad CRC). Never panics on garbage.

Tests to pin behavior

#TestExpected
T1Append "A", "B", reopen, iter → ["A", "B"]Both records returned in order
T2Append, truncate WAL by 1 byte (cut payload), reopen, iterLast record dropped
T3Append, flip a payload byte, iterReader stops at bad CRC
T4Append, write \0\0\0\0\0\0\0\0 past EOF, reopenFile length restored to pre-garbage size
T5append() returned offsets are strictly increasing and equal to file size after that appendYes

Gotchas

  • macOS fsync does not flush the disk write cache. Use F_FULLFSYNC for tests that must outlive a power loss.
  • Rust File::write_all does not call flush on the kernel level, only the userspace BufWriter. We use raw pwrite via nix / std::os::unix::fs::FileExt::write_all_at to skip the userspace buffer entirely.
  • Go os.File.Write is unbuffered by default, but bufio.Writer is not. Make sure your Wal does not wrap the file in a bufio.Writer — that defers writes invisibly and confuses sync.