Observation — SSTable Format

Smallest possible SSTable

Build from an empty MemTable: zero entries, zero data blocks, an empty index, and just the footer.

file size: 0 + 4 (index count=0) + 32 (footer) = 36 bytes

Hex (annotated):

offset
0000:  00 00 00 00                                          # index block: count=0
0004:  00 00 00 00 00 00 00 00   index_offset = 0
000c:  04 00 00 00 00 00 00 00   index_size   = 4
0014:  00 00 00 00 00 00 00 00   num_blocks   = 0
001c:  53 53 54 31 00 00 00 00   magic        = "SST1\0\0\0\0"

File-size formula

For a build with N entries spread across K data blocks where the sum of key sizes is Σk and the sum of value sizes (only for non-tombstone entries) is Σv:

data_bytes  = Σ_blocks ( 4 + Σ_entries_in_block (9 + k + v) )
            = 4·K + N·9 + Σk + Σv
index_bytes = 4 + Σ_blocks ( 4 + 8 + 8 + first_key_len )
            = 4 + K·20 + Σ_block_first_key_lens
file_bytes  = data_bytes + index_bytes + 32

(The 4-byte per-block header is the entry count. The 20-byte per-index-entry overhead is klen u32 + offset u64 + size u64.)

Hex walkthrough of a 3-entry SSTable

Three small entries forced into one block by the small block target — e.g. put a 1, put bb 22, del ccc:

00000000  03 00 00 00                       count=3
00000004  01 00 00 00 01 00 00 00 00 'a' '1'              # entry 1: klen=1 vlen=1 type=0 "a" "1"
00000011  02 00 00 00 02 00 00 00 00 'b' 'b' '2' '2'      # entry 2: klen=2 vlen=2 type=0 "bb" "22"
0000001e  03 00 00 00 00 00 00 00 01 'c' 'c' 'c'          # entry 3: klen=3 vlen=0 type=1 "ccc"

00000028  01 00 00 00                       # index count=1
0000002c  01 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 'a'   # klen=1 offset=0 size=0x28 "a"

00000048  00 00 00 00 00 00 00 00           # footer.index_offset = 0x28
00000050  19 00 00 00 00 00 00 00           # footer.index_size   = 0x19
00000058  01 00 00 00 00 00 00 00           # footer.num_blocks   = 1
00000060  53 53 54 31 00 00 00 00           # magic "SST1\0\0\0\0"
                                            #   (file size = 0x68 = 104 bytes)

Note that the first key of the single block is "a", so the index entry copies that key.

What broken looks like

SymptomLikely cause
BadMagic at openfile truncated, or footer overwritten by an interrupted writer.
BadBlock reading a blockblock size in the index disagrees with the in-file count header — e.g. wrong endianness.
Two languages produce different file sizes for identical inputblock-flush rule mismatch (> vs >=).
Unsorted from the writercaller didn't iterate the MemTable in sorted order before add.
IndexOutOfRange at readcorrupted offset/size in the index — checked against file_len - 32 to fail loudly.