db-16 — Distributed-Fundamentals

This lab builds the vocabulary the rest of the distributed track (db-17 Raft, db-18 Paxos, db-19 ZAB, db-20 distributed-kv) will speak in: logical clocks, vector clocks, the happens-before relation, and a deterministic discrete-event simulator that produces a byte-identical event log across three independent implementations (Rust, Go, C++).

If you cannot write a simulator whose output is bit-stable across runs and across languages, you cannot run reproducible distributed-systems experiments. Every other lab in the track will reuse the discipline established here.


What is it?

A distributed system is a collection of nodes that exchange messages over an asynchronous, lossy network. Three primitives let us reason about such systems without having a wall-clock everyone agrees on:

  1. Lamport clock — a single integer per node that is incremented on every local event, stamped onto each outgoing message, and bumped to max(self, incoming) + 1 on receive. Lamport (1978) proved that this discipline produces a total order consistent with causality: if event a happens-before event b, then ts(a) < ts(b). The reverse is not true.

  2. Vector clock — one counter per node, packaged into a map. Local event increments the owner's counter; receive does pointwise max(self, incoming) then increments the owner's counter. The resulting partial order is the happens-before relation: two events are concurrent iff neither clock dominates the other.

  3. Deterministic discrete-event simulator — a single-threaded loop that drives sim time forward in integer ticks, delivering messages whose delivery_time == t before letting nodes act. With a seeded PRNG and canonical message ordering, the same (seed, nodes, rounds) triple must always produce the same event log — in any language.


Why does it matter?

  • Raft (db-17), Paxos (db-18), ZAB (db-19) all rely on causality: a leader can only commit an entry after it has been replicated to a quorum of followers. Vector clocks give us the language to prove that a particular log entry could not have been committed before a prerequisite was acknowledged.

  • Reproducibility is the difference between "I think my consensus algorithm is correct" and "I have an event log I can re-run on someone else's machine and get the same answer." When db-17 develops a leader-election bug under network partition, the first thing you reach for is a deterministic replay of the failure.

  • Three independent implementations forces clarity. Any ambiguity in the spec ("when do you read the clock vs. increment it?") will show up as a byte diff in scripts/cross_test.sh. Pinning the wire format and the scheduling rule is the lab.


How does it work?

Lamport rule

local event :  self += 1
send        :  self += 1 ; stamp message with self
recv(m)     :  self = max(self, m.ts) + 1

Vector-clock rule

local event(i)    :  vc[i] += 1
send(i)           :  vc[i] += 1 ; stamp message with snapshot of vc
recv(i, m)        :  for k in m.vc : vc[k] = max(vc[k], m.vc[k])
                     vc[i] += 1
partial order     :  vc_a < vc_b   iff (∀k) vc_a[k] ≤ vc_b[k]  AND  vc_a ≠ vc_b
                     vc_a || vc_b  iff neither <  nor  > nor =

Simulator loop

for t in 0 .. rounds + MAX_DELAY:
    # 1. deliver — strict (delivery_time, sender_id, seq) order
    while heap.top().delivery_time == t:
        msg = heap.pop()
        node[msg.dest].recv(msg)
        emit Recv

    # 2. send — only during the active window
    if t < rounds:
        for s in 0 .. nodes:
            r       = splitmix64(seed ^ (t<<32) ^ (s+1))
            dest    = ((r          & 0xFFFF) % (nodes - 1)) ; skip self
            delay   = 1 + ((r>>16) & 0xFFFF) % 3
            payload = (r>>32) & 0xFF
            node[s].send_to(dest, delay, payload)
            emit Send

The two phases (deliver-then-send) per tick, the strict heap ordering, and the splitmix64 PRNG together guarantee determinism.

Canonical wire format

file := magic[4="DSE6"] u32_le(event_count) event*

event :=
    u8  kind                  # 1 = Send, 2 = Recv
    u64_le sim_time
    u32_le node               # sender for Send, receiver for Recv
    u32_le peer               # dest for Send, source for Recv
    u64_le lamport            # value AFTER the local step
    u32_le vc_len
    [u32_le node, u64_le counter] * vc_len   # sorted ASC by node
    u32_le payload_len
    u8 payload[payload_len]

All multi-byte numbers are little-endian. Vector-clock entries must be serialized in ascending order by node-id; this is the single most common source of byte-diff bugs.

Cross-language invariants

InvariantWhy it matters
splitmix64 mix seed ^ (t<<32) ^ (s+1)identical PRNG stream
dest skip-self: if pre >= s then pre+1identical destination choice
heap order (delivery_time, sender, seq)identical delivery order
seq is global monotonicdeterministic tie-break across nodes
VC entries sorted by node-id on the wirebyte-identical serialization
all integers little-endianbyte-identical on every host

If any one of these drifts, scripts/cross_test.sh will fail at the sha256 compare and cmp -l will print the byte offset of the first divergence.


Files

  • src/rust/distfund16 crate + simctl binary.
  • src/go/ — module github.com/10xdev/dse/db16 + cmd/simctl.
  • src/cpp/db16_lib static library + simctl binary + test_db16.
  • scripts/verify.sh — runs the unit tests for all three.
  • scripts/cross_test.sh — proves the three binaries produce byte-identical event logs for two seeded scenarios.

See docs/ for the longer write-up, and steps/ for the staged implementation path.