Step 3 — Group commit benchmark

Goal

Quantify the cost of fsync and the throughput win from group commit.

bench-group PATH N BATCH:

for i in 0..N:
    append(payload)
    if (i+1) % BATCH == 0: sync()
sync()   // final

PATH is a brand-new file each run. N = 50_000 is a good starting point on a modern SSD.

Batch	Throughput	Avg latency / sync	Bytes flushed / sync
1	~1,800 rec/s	~560 µs	~72 B
8	~12,000 rec/s	~670 µs	~576 B
64	~110,000 rec/s	~580 µs	~4.6 KB
512	~260,000 rec/s	~1.0 ms	~37 KB
4096	~310,000 rec/s	~13 ms	~295 KB

Two effects worth noting:

Sync time is roughly constant up to ~4KB: the bottleneck is the per-fsync overhead (syscall + journal commit), not the byte count.
Returns diminish past batch ~256: bandwidth becomes the next limit. Past ~4096 you start hitting tail-latency cliffs.

Per-record throughput is the same as group=64: your sync() isn't doing anything (no-op, wrong fd, or bufio.Writer swallowing the write).
Throughput keeps climbing past group=4096: you may not be calling sync() at all between batches.
macOS numbers look impossibly fast: plain fsync does not flush the device cache. Re-run with F_FULLFSYNC to compare.

On NVMe + ext4:

The shape is identical; absolute numbers depend on the device's flush latency.