Step 3 — Group commit benchmark

Goal

Quantify the cost of fsync and the throughput win from group commit.

Workload

bench-group PATH N BATCH:

for i in 0..N:
    append(payload)
    if (i+1) % BATCH == 0: sync()
sync()   // final

PATH is a brand-new file each run. N = 50_000 is a good starting point on a modern SSD.

Numbers to look for (M2 Pro, APFS, 64-byte payload)

BatchThroughputAvg latency / syncBytes flushed / sync
1~1,800 rec/s~560 µs~72 B
8~12,000 rec/s~670 µs~576 B
64~110,000 rec/s~580 µs~4.6 KB
512~260,000 rec/s~1.0 ms~37 KB
4096~310,000 rec/s~13 ms~295 KB

Two effects worth noting:

  • Sync time is roughly constant up to ~4KB: the bottleneck is the per-fsync overhead (syscall + journal commit), not the byte count.
  • Returns diminish past batch ~256: bandwidth becomes the next limit. Past ~4096 you start hitting tail-latency cliffs.

What "broken" looks like

  • Per-record throughput is the same as group=64: your sync() isn't doing anything (no-op, wrong fd, or bufio.Writer swallowing the write).
  • Throughput keeps climbing past group=4096: you may not be calling sync() at all between batches.
  • macOS numbers look impossibly fast: plain fsync does not flush the device cache. Re-run with F_FULLFSYNC to compare.

Comparing to a Linux box

On NVMe + ext4:

BatchThroughput
1~3,000 rec/s
64~180,000 rec/s
4096~600,000 rec/s

The shape is identical; absolute numbers depend on the device's flush latency.