Step 2 — pread and Hexdump

Goal

Implement the read side (pread) and a hexdump utility, then prove cross-implementation byte-compatibility: a file written by Rust must read identically from Go and C++.

The Read Side

Symmetric to Step 1:

read_page(path: string, page_no: u64) -> [u8; PAGE_SIZE]
  • Open the file read-only.
  • pread(fd, buf, PAGE_SIZE, page_no * PAGE_SIZE).
  • Return the buffer (caller will strip trailing zeros or use the magic header to validate).

Rust

#![allow(unused)]
fn main() {
file.read_exact_at(&mut buf, offset)?;
}

Go

n, err := f.ReadAt(buf, offset)
if err != nil && err != io.EOF { return nil, err }

ReadAt returns io.EOF if n < len(buf) — this is normal for the last page of a file that hasn't been preallocated. Tests handle this case.

C++

ssize_t n = ::pread(fd, buf.data(), buf.size(), offset);
if (n < 0) return std::errc::io_error;
buf.resize(n);   // shrink if short read

Page Header Format

Every non-empty page in our format begins with a 16-byte header:

offset  size  field
------  ----  -----
   0     8    magic = 0x44534531_50414745  (LE: 45 47 41 50 31 45 53 44 ; ASCII reversed: "EGAP1ESD")
   8     2    version = 1
  10     2    flags = 0
  12     4    payload_len  (LE u32, number of bytes used after the header)
  16     n    payload bytes
n+16     —    zero-pad to PAGE_SIZE

This is a deliberately simple format — we'll grow it in later labs. For now it gives us:

  • A magic number to detect "is this a valid page?"
  • A version field to evolve the format later.
  • An explicit payload length so we don't have to scan for zeros (zeros are valid bytes in real data).

The Hexdump Utility

A canonical 16-byte-per-line hex dump:

00000000: 4547 4150 3145 5344 0100 0000 0a00 0000  EGAP1ESD........
00000010: 6669 7273 7420 7061 6765 0000 0000 0000  first page......
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
...

Format spec:

  • 8-digit hex offset, then : .
  • 16 bytes per line, grouped 2 bytes per word, separated by single space.
  • 2-space gap before the ASCII rendering.
  • ASCII rendering: printable ASCII as itself, otherwise ..

This format matches xxd output for easy diff-based cross-language verification.

Cross-Implementation Test

This is the most important check in this lab. Run scripts/cross_test.sh:

bash scripts/cross_test.sh

What it does (excerpt):

$RUST write /tmp/xt.bin 0 "from rust"
$GO   write /tmp/xt.bin 1 "from go"
$CPP  write /tmp/xt.bin 2 "from cpp"

$RUST hexdump /tmp/xt.bin > /tmp/h.rust
$GO   hexdump /tmp/xt.bin > /tmp/h.go
$CPP  hexdump /tmp/xt.bin > /tmp/h.cpp

diff /tmp/h.rust /tmp/h.go || { echo "RUST/GO mismatch"; exit 1; }
diff /tmp/h.rust /tmp/h.cpp || { echo "RUST/CPP mismatch"; exit 1; }
echo "cross-language byte-compat OK"

If this fails, the most common bugs are:

  1. Wrong endianness on the magic or payload_len.
  2. Forgetting to zero-pad the page (one impl leaves junk past the payload).
  3. Off-by-one on the offset calculation (page_no * PAGE_SIZE vs (page_no + 1) * PAGE_SIZE).

What Just Happened

You now have a portable, file-format-compatible storage primitive across three languages. This is the foundation for every later lab — the WAL in db-03 is exactly this with append-only semantics, and the SSTable in db-06 is this with a richer block format.

Next

Step 3: measure latency, demonstrate the page cache, and understand why your second read of a page is 1000× faster than the first.