Step 2 — Implement the Hash Table

Goal

Open-addressing hash table with linear probing + Robin Hood + backward-shift deletion.

API

HashTable::new(initial_capacity_pow2: usize) -> HashTable
HashTable::insert(key, value) -> bool          // false if replaced
HashTable::get(key) -> Option<value>
HashTable::remove(key) -> bool
HashTable::len() -> usize
HashTable::capacity() -> usize
HashTable::load_factor() -> f64
HashTable::max_probe() -> usize
HashTable::probe_histogram() -> Vec<usize>

Hash function

FNV-1a 64-bit followed by a SplitMix64 finalizer (identical in all three languages):

offset = 0xcbf29ce484222325
prime  = 0x00000100000001b3        # = 1_099_511_628_211
h = offset
for byte in key:
    h ^= byte
    h = h * prime  (wrapping)
return splitmix64_finalize(h)

fn splitmix64_finalize(mut h: u64) -> u64 {
    h ^= h >> 30; h = h.wrapping_mul(0xbf58476d1ce4e5b9);
    h ^= h >> 27; h = h.wrapping_mul(0x94d049bb133111eb);
    h ^= h >> 31;
    h
}

Plain FNV-1a has notoriously poor avalanche on short, sequential keys — running it raw against Robin Hood probing blows up max_probe (we measured 206 vs the expected ≲66 bound at N=100k). The SplitMix64 finalizer is bijective (adds no collisions) and re-mixes the high bits down, which restores the geometric PSL distribution.

Known-answer vectors (pin these in tests across all three languages):

key	hash
`""`	`0xf52a15e9a9b5e89b`
`"a"`	`0x02c0bdbf481420f8`
`"foobar"`	`0x404da9e3b74078c2`

Slot layout

Slot {
    occupied: bool        // or use psl = MAX as sentinel
    psl:      u32         // probe sequence length
    hash:     u64
    key:      Vec<u8>
    value:    Vec<u8>
}

We store the full 64-bit hash inside each slot so resizes don't re-hash keys, and so that get can compare hashes (cheap) before keys (expensive).

Insert (Robin Hood)

if (len + 1) / capacity > 0.85: resize(capacity * 2)

h = hash(key)
i = h & (capacity - 1)
psl = 0
loop:
    if !slots[i].occupied:
        slots[i] = (key, value, h, psl); occupied; len += 1
        return true
    if slots[i].hash == h && slots[i].key == key:
        slots[i].value = value
        return false                 // replaced
    if slots[i].psl < psl:
        swap(slots[i], (key, value, h, psl))   // steal
    i = (i + 1) & (capacity - 1)
    psl += 1

Get

h = hash(key)
i = h & (capacity - 1)
psl = 0
loop:
    if !slots[i].occupied: return None
    if slots[i].psl < psl: return None      // would have stolen
    if slots[i].hash == h && slots[i].key == key:
        return Some(slots[i].value)
    i = (i + 1) & (capacity - 1)
    psl += 1

Remove (backward-shift)

i = find_slot(key)
if not found: return false
loop:
    j = (i + 1) & (capacity - 1)
    if !slots[j].occupied || slots[j].psl == 0:
        slots[i].occupied = false
        len -= 1
        return true
    slots[i] = slots[j]
    slots[i].psl -= 1
    i = j

Resize

new_slots = [empty; capacity * 2]
old = swap(slots, new_slots); capacity *= 2; len = 0
for slot in old where occupied:
    insert(slot.key, slot.value)   // re-uses hash if you cache it

Tests

#	Test	Pass if
T1	insert + get of 10k random keys	all hits
T2	insert 10k, remove 5k random, get remaining 5k	all still found, removed not found
T3	insert past 85% load triggers resize	capacity doubled
T4	duplicate insert replaces value, len unchanged	replacement worked
T5	max PSL ≤ 4·log₂(cap·load) at 1M inserts	bounded variance

Distributed Systems Engineer — Build Databases & Consensus From Scratch