Step 1 — Record framing & CRC
Goal
Define the on-disk format and a streaming CRC32 implementation that matches between Rust, Go, and C++.
Format recap
┌─────────┬─────────┬──────────────────────┐
│ len(u32)│ crc(u32)│ payload (len bytes) │
└─────────┴─────────┴──────────────────────┘
4 4 N
- Both
u32fields are little-endian. - CRC is over the payload only.
len == 0is the EOF sentinel (an empty payload cannot be appended).
CRC32 — table-driven, reflected
poly = 0xEDB88320 // reflected IEEE 802.3 polynomial
table[256]: built once at startup
for each input byte b:
crc = (crc >> 8) ^ table[(crc & 0xff) ^ b]
return crc ^ 0xFFFFFFFF // final XOR
initial value before processing: 0xFFFFFFFF
Known-answer vectors
| input | CRC32 hex |
|---|---|
"" | 0x00000000 |
"a" | 0xE8B7BE43 |
"123456789" | 0xCBF43926 |
Pin these in every language's unit tests. They are the canonical crc32 IEEE vectors used by zlib, gzip, Ethernet, and the LevelDB log.
Rust outline
#![allow(unused)] fn main() { pub fn crc32_ieee(bytes: &[u8]) -> u32 { let mut c: u32 = 0xFFFF_FFFF; for &b in bytes { c = (c >> 8) ^ TABLE[((c & 0xff) ^ b as u32) as usize]; } c ^ 0xFFFF_FFFF } }
Go outline
func Crc32IEEE(b []byte) uint32 {
c := uint32(0xFFFFFFFF)
for _, x := range b {
c = (c >> 8) ^ table[byte(c)^x]
}
return c ^ 0xFFFFFFFF
}
C++ outline
inline std::uint32_t Crc32Ieee(std::span<const std::uint8_t> b) noexcept {
std::uint32_t c = 0xFFFFFFFFu;
for (auto x : b) c = (c >> 8) ^ kTable[(c & 0xff) ^ x];
return c ^ 0xFFFFFFFFu;
}
Trap: which CRC?
There are at least eight in common use. IEEE (reflected, init 0xFFFFFFFF, final XOR 0xFFFFFFFF) is what we want. 0x04C11DB7 un-reflected is not the same value despite being the same polynomial.
If your test gives 0x4DBDF21C for "a", you're using CRC-32C (Castagnoli). Different polynomial, different table.