Step 1 — Record framing & CRC

Goal

Define the on-disk format and a streaming CRC32 implementation that matches between Rust, Go, and C++.

Format recap

 ┌─────────┬─────────┬──────────────────────┐
 │ len(u32)│ crc(u32)│ payload (len bytes)  │
 └─────────┴─────────┴──────────────────────┘
       4         4              N
  • Both u32 fields are little-endian.
  • CRC is over the payload only.
  • len == 0 is the EOF sentinel (an empty payload cannot be appended).

CRC32 — table-driven, reflected

poly = 0xEDB88320  // reflected IEEE 802.3 polynomial
table[256]: built once at startup
for each input byte b:
    crc = (crc >> 8) ^ table[(crc & 0xff) ^ b]
return crc ^ 0xFFFFFFFF                  // final XOR
initial value before processing: 0xFFFFFFFF

Known-answer vectors

inputCRC32 hex
""0x00000000
"a"0xE8B7BE43
"123456789"0xCBF43926

Pin these in every language's unit tests. They are the canonical crc32 IEEE vectors used by zlib, gzip, Ethernet, and the LevelDB log.

Rust outline

#![allow(unused)]
fn main() {
pub fn crc32_ieee(bytes: &[u8]) -> u32 {
    let mut c: u32 = 0xFFFF_FFFF;
    for &b in bytes {
        c = (c >> 8) ^ TABLE[((c & 0xff) ^ b as u32) as usize];
    }
    c ^ 0xFFFF_FFFF
}
}

Go outline

func Crc32IEEE(b []byte) uint32 {
    c := uint32(0xFFFFFFFF)
    for _, x := range b {
        c = (c >> 8) ^ table[byte(c)^x]
    }
    return c ^ 0xFFFFFFFF
}

C++ outline

inline std::uint32_t Crc32Ieee(std::span<const std::uint8_t> b) noexcept {
    std::uint32_t c = 0xFFFFFFFFu;
    for (auto x : b) c = (c >> 8) ^ kTable[(c & 0xff) ^ x];
    return c ^ 0xFFFFFFFFu;
}

Trap: which CRC?

There are at least eight in common use. IEEE (reflected, init 0xFFFFFFFF, final XOR 0xFFFFFFFF) is what we want. 0x04C11DB7 un-reflected is not the same value despite being the same polynomial.

If your test gives 0x4DBDF21C for "a", you're using CRC-32C (Castagnoli). Different polynomial, different table.