Analysis — Design Decisions

Every choice here is reversible in code but irreversible in performance: change one, all the others bend with it.

D1. Skip list over balanced BST (red-black, AVL)

Skip listRed-black tree
Insert / lookupO(log n) expectedO(log n) worst case
Implementation LOC~120 (Rust)~400+
Concurrent readsLock-free with seqlock or single-CAS publishRequires hand-over-hand locking
Cache localityPoor (pointer chasing)Poor (pointer chasing)
Worst-case boundw.h.p., not absoluteAbsolute

Choice: skip list. The simplicity (no rotations, no rebalancing, no parent pointers) is the value proposition. Worst-case absolute bound is irrelevant when we control the hash that feeds keys in.

When you'd flip: real-time systems where a probabilistic O(n) blowup is unacceptable. Database MemTables don't qualify.

D2. Hash table: open addressing + Robin Hood, not chaining

Open addressing (linear)Chaining
Memory per entry(1/loadfactor) × sizeof(slot)sizeof(entry) + sizeof(ptr) + malloc overhead
Cache misses per lookup0–1 typically1–3 typically
Pointer stability across resizeNoYes
DeletionBackward-shift or tombstoneFree a list node
Tail latency at high loadSpikes near 1.0Degrades gracefully

Choice: open addressing, linear probing, Robin Hood. The cache-miss saving is decisive at small/medium values, which is the database use case.

When you'd flip: large values (≥1 KiB) where you want pointer stability so external references survive resize.

D3. p = 0.5 for skip list, not p = 0.25

The expected number of comparisons per search is (1/p) · log_{1/p}(n). Minimizing this gives p = 1/e ≈ 0.37. In practice:

pExpected comparisons (n=1M)LevelsMemory per node
0.25~16~101.33 forward ptrs avg
0.5~20~202 forward ptrs avg
1/e~14~131.58 forward ptrs avg

We pick p = 0.5 because (a) bit-shift sampling (count trailing zeros of a random u64) is one instruction, and (b) the code stays trivial. The 30% theoretical improvement from p=1/e is not worth the table-lookup or log math.

D4. Max level = 32

A skip list with p=0.5 and n entries has expected max level log₂ n. At max level 32 we support n = 2^32 ≈ 4 G entries before quality degrades. Going higher costs a forward-pointer slot per node forever (8 bytes per extra level). Going lower caps the structure size.

D5. Hash function: FNV-1a 64-bit

We need a hash that is:

  • High quality enough for non-adversarial keys (passes basic distribution tests)
  • Fast for small keys (~10 cycles per 8 bytes)
  • Identical in all three languages so cross-language tests can compare counts

FNV-1a is 6 lines of code, deterministic, and produces nearly the same probe-distance distribution as xxHash3 for keys ≤ 32 bytes. We use it because we control the input keys in this lab; in production you'd switch to xxHash3 or SipHash.

When you'd flip: keys controlled by adversaries → SipHash with a per-instance random key.

D6. Load factor 0.85, grow ×2

Robin Hood handles load factor up to ~0.9 well; beyond that the variance of probe distance blows up. We resize at 0.85 to leave headroom and double the capacity (and rehash). Cost of resize is amortized O(1) per insert.

D7. Backward-shift deletion, no tombstones

Tombstones simplify code but bloat the table over time — a "delete-then-insert" workload fills the table with markers and forces a resize. Backward-shift deletion costs O(PSL) per delete (typically <5 slots) and keeps the table compact. The implementation walks forward from the deleted slot, moves each entry one slot left until reaching an empty slot or an entry whose PSL is 0.

D8. Seed: deterministic per-instance, random across instances

The skip list RNG must not be deterministically the same across runs. But within one process run we want reproducible behavior for debugging. The CLI accepts an optional seed; tests pass a fixed value.

Cost-model cheat sheet

OperationLatency
L1 cache hit~1 ns
L2 cache hit~4 ns
L3 cache hit~15 ns
Main memory~100 ns
4 KiB page from SSD (cold)~100 µs
4 KiB page from HDD~10 ms

Every random pointer dereference is a potential L3 miss → 100 ns. A skip list with 20 levels traverses 20 such pointers in the worst case = 2 µs. A linear-probing hash table touches 1–2 cache lines = ~10 ns. Hash table is ~100× faster for point lookups in RAM, and the gap grows as the working set leaves L3.

What breaks at scale

SymptomCauseMitigation
Skip-list lookup tail latency 10× worse than medianBad RNG sequence; node height varianceUse a higher-quality PRNG; bound max height
Hash table tail latency spikes near 90% loadRobin Hood variance explodesResize earlier (load factor 0.75)
Skip-list memory 2× of equivalent BSTForward-pointer array overheadUse per-level arena allocators; pack pointer arrays
Hash table grows but never shrinksResize is unidirectional in our implShrink when load < 0.25 (we don't — extension point)
Iterator skips entriesMutation during iterationSnapshot at iterator-creation time (db-05 covers this)