Analysis — Design Decisions
Every choice here is reversible in code but irreversible in performance: change one, all the others bend with it.
D1. Skip list over balanced BST (red-black, AVL)
| Skip list | Red-black tree | |
|---|---|---|
| Insert / lookup | O(log n) expected | O(log n) worst case |
| Implementation LOC | ~120 (Rust) | ~400+ |
| Concurrent reads | Lock-free with seqlock or single-CAS publish | Requires hand-over-hand locking |
| Cache locality | Poor (pointer chasing) | Poor (pointer chasing) |
| Worst-case bound | w.h.p., not absolute | Absolute |
Choice: skip list. The simplicity (no rotations, no rebalancing, no parent pointers) is the value proposition. Worst-case absolute bound is irrelevant when we control the hash that feeds keys in.
When you'd flip: real-time systems where a probabilistic O(n) blowup is unacceptable. Database MemTables don't qualify.
D2. Hash table: open addressing + Robin Hood, not chaining
| Open addressing (linear) | Chaining | |
|---|---|---|
| Memory per entry | (1/loadfactor) × sizeof(slot) | sizeof(entry) + sizeof(ptr) + malloc overhead |
| Cache misses per lookup | 0–1 typically | 1–3 typically |
| Pointer stability across resize | No | Yes |
| Deletion | Backward-shift or tombstone | Free a list node |
| Tail latency at high load | Spikes near 1.0 | Degrades gracefully |
Choice: open addressing, linear probing, Robin Hood. The cache-miss saving is decisive at small/medium values, which is the database use case.
When you'd flip: large values (≥1 KiB) where you want pointer stability so external references survive resize.
D3. p = 0.5 for skip list, not p = 0.25
The expected number of comparisons per search is (1/p) · log_{1/p}(n). Minimizing this gives p = 1/e ≈ 0.37. In practice:
p | Expected comparisons (n=1M) | Levels | Memory per node |
|---|---|---|---|
| 0.25 | ~16 | ~10 | 1.33 forward ptrs avg |
| 0.5 | ~20 | ~20 | 2 forward ptrs avg |
| 1/e | ~14 | ~13 | 1.58 forward ptrs avg |
We pick p = 0.5 because (a) bit-shift sampling (count trailing zeros of a random u64) is one instruction, and (b) the code stays trivial. The 30% theoretical improvement from p=1/e is not worth the table-lookup or log math.
D4. Max level = 32
A skip list with p=0.5 and n entries has expected max level log₂ n. At max level 32 we support n = 2^32 ≈ 4 G entries before quality degrades. Going higher costs a forward-pointer slot per node forever (8 bytes per extra level). Going lower caps the structure size.
D5. Hash function: FNV-1a 64-bit
We need a hash that is:
- High quality enough for non-adversarial keys (passes basic distribution tests)
- Fast for small keys (~10 cycles per 8 bytes)
- Identical in all three languages so cross-language tests can compare counts
FNV-1a is 6 lines of code, deterministic, and produces nearly the same probe-distance distribution as xxHash3 for keys ≤ 32 bytes. We use it because we control the input keys in this lab; in production you'd switch to xxHash3 or SipHash.
When you'd flip: keys controlled by adversaries → SipHash with a per-instance random key.
D6. Load factor 0.85, grow ×2
Robin Hood handles load factor up to ~0.9 well; beyond that the variance of probe distance blows up. We resize at 0.85 to leave headroom and double the capacity (and rehash). Cost of resize is amortized O(1) per insert.
D7. Backward-shift deletion, no tombstones
Tombstones simplify code but bloat the table over time — a "delete-then-insert" workload fills the table with markers and forces a resize. Backward-shift deletion costs O(PSL) per delete (typically <5 slots) and keeps the table compact. The implementation walks forward from the deleted slot, moves each entry one slot left until reaching an empty slot or an entry whose PSL is 0.
D8. Seed: deterministic per-instance, random across instances
The skip list RNG must not be deterministically the same across runs. But within one process run we want reproducible behavior for debugging. The CLI accepts an optional seed; tests pass a fixed value.
Cost-model cheat sheet
| Operation | Latency |
|---|---|
| L1 cache hit | ~1 ns |
| L2 cache hit | ~4 ns |
| L3 cache hit | ~15 ns |
| Main memory | ~100 ns |
| 4 KiB page from SSD (cold) | ~100 µs |
| 4 KiB page from HDD | ~10 ms |
Every random pointer dereference is a potential L3 miss → 100 ns. A skip list with 20 levels traverses 20 such pointers in the worst case = 2 µs. A linear-probing hash table touches 1–2 cache lines = ~10 ns. Hash table is ~100× faster for point lookups in RAM, and the gap grows as the working set leaves L3.
What breaks at scale
| Symptom | Cause | Mitigation |
|---|---|---|
| Skip-list lookup tail latency 10× worse than median | Bad RNG sequence; node height variance | Use a higher-quality PRNG; bound max height |
| Hash table tail latency spikes near 90% load | Robin Hood variance explodes | Resize earlier (load factor 0.75) |
| Skip-list memory 2× of equivalent BST | Forward-pointer array overhead | Use per-level arena allocators; pack pointer arrays |
| Hash table grows but never shrinks | Resize is unidirectional in our impl | Shrink when load < 0.25 (we don't — extension point) |
| Iterator skips entries | Mutation during iteration | Snapshot at iterator-creation time (db-05 covers this) |