Step 2 — pread and Hexdump
Goal
Implement the read side (pread) and a hexdump utility, then prove cross-implementation byte-compatibility: a file written by Rust must read identically from Go and C++.
The Read Side
Symmetric to Step 1:
read_page(path: string, page_no: u64) -> [u8; PAGE_SIZE]
- Open the file read-only.
pread(fd, buf, PAGE_SIZE, page_no * PAGE_SIZE).- Return the buffer (caller will strip trailing zeros or use the magic header to validate).
Rust
#![allow(unused)] fn main() { file.read_exact_at(&mut buf, offset)?; }
Go
n, err := f.ReadAt(buf, offset)
if err != nil && err != io.EOF { return nil, err }
ReadAt returns io.EOF if n < len(buf) — this is normal for the last page of a file that hasn't been preallocated. Tests handle this case.
C++
ssize_t n = ::pread(fd, buf.data(), buf.size(), offset);
if (n < 0) return std::errc::io_error;
buf.resize(n); // shrink if short read
Page Header Format
Every non-empty page in our format begins with a 16-byte header:
offset size field
------ ---- -----
0 8 magic = 0x44534531_50414745 (LE: 45 47 41 50 31 45 53 44 ; ASCII reversed: "EGAP1ESD")
8 2 version = 1
10 2 flags = 0
12 4 payload_len (LE u32, number of bytes used after the header)
16 n payload bytes
n+16 — zero-pad to PAGE_SIZE
This is a deliberately simple format — we'll grow it in later labs. For now it gives us:
- A magic number to detect "is this a valid page?"
- A version field to evolve the format later.
- An explicit payload length so we don't have to scan for zeros (zeros are valid bytes in real data).
The Hexdump Utility
A canonical 16-byte-per-line hex dump:
00000000: 4547 4150 3145 5344 0100 0000 0a00 0000 EGAP1ESD........
00000010: 6669 7273 7420 7061 6765 0000 0000 0000 first page......
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
...
Format spec:
- 8-digit hex offset, then
:. - 16 bytes per line, grouped 2 bytes per word, separated by single space.
- 2-space gap before the ASCII rendering.
- ASCII rendering: printable ASCII as itself, otherwise
..
This format matches xxd output for easy diff-based cross-language verification.
Cross-Implementation Test
This is the most important check in this lab. Run scripts/cross_test.sh:
bash scripts/cross_test.sh
What it does (excerpt):
$RUST write /tmp/xt.bin 0 "from rust"
$GO write /tmp/xt.bin 1 "from go"
$CPP write /tmp/xt.bin 2 "from cpp"
$RUST hexdump /tmp/xt.bin > /tmp/h.rust
$GO hexdump /tmp/xt.bin > /tmp/h.go
$CPP hexdump /tmp/xt.bin > /tmp/h.cpp
diff /tmp/h.rust /tmp/h.go || { echo "RUST/GO mismatch"; exit 1; }
diff /tmp/h.rust /tmp/h.cpp || { echo "RUST/CPP mismatch"; exit 1; }
echo "cross-language byte-compat OK"
If this fails, the most common bugs are:
- Wrong endianness on the magic or
payload_len. - Forgetting to zero-pad the page (one impl leaves junk past the payload).
- Off-by-one on the offset calculation (
page_no * PAGE_SIZEvs(page_no + 1) * PAGE_SIZE).
What Just Happened
You now have a portable, file-format-compatible storage primitive across three languages. This is the foundation for every later lab — the WAL in db-03 is exactly this with append-only semantics, and the SSTable in db-06 is this with a richer block format.
Next
Step 3: measure latency, demonstrate the page cache, and understand why your second read of a page is 1000× faster than the first.