db-16 — Distributed-Fundamentals
This lab builds the vocabulary the rest of the distributed track (db-17 Raft, db-18 Paxos, db-19 ZAB, db-20 distributed-kv) will speak in: logical clocks, vector clocks, the happens-before relation, and a deterministic discrete-event simulator that produces a byte-identical event log across three independent implementations (Rust, Go, C++).
If you cannot write a simulator whose output is bit-stable across runs and across languages, you cannot run reproducible distributed-systems experiments. Every other lab in the track will reuse the discipline established here.
What is it?
A distributed system is a collection of nodes that exchange messages over an asynchronous, lossy network. Three primitives let us reason about such systems without having a wall-clock everyone agrees on:
-
Lamport clock — a single integer per node that is incremented on every local event, stamped onto each outgoing message, and bumped to
max(self, incoming) + 1on receive. Lamport (1978) proved that this discipline produces a total order consistent with causality: if eventahappens-before eventb, thents(a) < ts(b). The reverse is not true. -
Vector clock — one counter per node, packaged into a map. Local event increments the owner's counter; receive does pointwise
max(self, incoming)then increments the owner's counter. The resulting partial order is the happens-before relation: two events are concurrent iff neither clock dominates the other. -
Deterministic discrete-event simulator — a single-threaded loop that drives sim time forward in integer ticks, delivering messages whose
delivery_time == tbefore letting nodes act. With a seeded PRNG and canonical message ordering, the same(seed, nodes, rounds)triple must always produce the same event log — in any language.
Why does it matter?
-
Raft (db-17), Paxos (db-18), ZAB (db-19) all rely on causality: a leader can only commit an entry after it has been replicated to a quorum of followers. Vector clocks give us the language to prove that a particular log entry could not have been committed before a prerequisite was acknowledged.
-
Reproducibility is the difference between "I think my consensus algorithm is correct" and "I have an event log I can re-run on someone else's machine and get the same answer." When db-17 develops a leader-election bug under network partition, the first thing you reach for is a deterministic replay of the failure.
-
Three independent implementations forces clarity. Any ambiguity in the spec ("when do you read the clock vs. increment it?") will show up as a byte diff in
scripts/cross_test.sh. Pinning the wire format and the scheduling rule is the lab.
How does it work?
Lamport rule
local event : self += 1
send : self += 1 ; stamp message with self
recv(m) : self = max(self, m.ts) + 1
Vector-clock rule
local event(i) : vc[i] += 1
send(i) : vc[i] += 1 ; stamp message with snapshot of vc
recv(i, m) : for k in m.vc : vc[k] = max(vc[k], m.vc[k])
vc[i] += 1
partial order : vc_a < vc_b iff (∀k) vc_a[k] ≤ vc_b[k] AND vc_a ≠ vc_b
vc_a || vc_b iff neither < nor > nor =
Simulator loop
for t in 0 .. rounds + MAX_DELAY:
# 1. deliver — strict (delivery_time, sender_id, seq) order
while heap.top().delivery_time == t:
msg = heap.pop()
node[msg.dest].recv(msg)
emit Recv
# 2. send — only during the active window
if t < rounds:
for s in 0 .. nodes:
r = splitmix64(seed ^ (t<<32) ^ (s+1))
dest = ((r & 0xFFFF) % (nodes - 1)) ; skip self
delay = 1 + ((r>>16) & 0xFFFF) % 3
payload = (r>>32) & 0xFF
node[s].send_to(dest, delay, payload)
emit Send
The two phases (deliver-then-send) per tick, the strict heap ordering, and the splitmix64 PRNG together guarantee determinism.
Canonical wire format
file := magic[4="DSE6"] u32_le(event_count) event*
event :=
u8 kind # 1 = Send, 2 = Recv
u64_le sim_time
u32_le node # sender for Send, receiver for Recv
u32_le peer # dest for Send, source for Recv
u64_le lamport # value AFTER the local step
u32_le vc_len
[u32_le node, u64_le counter] * vc_len # sorted ASC by node
u32_le payload_len
u8 payload[payload_len]
All multi-byte numbers are little-endian. Vector-clock entries must be serialized in ascending order by node-id; this is the single most common source of byte-diff bugs.
Cross-language invariants
| Invariant | Why it matters |
|---|---|
splitmix64 mix seed ^ (t<<32) ^ (s+1) | identical PRNG stream |
dest skip-self: if pre >= s then pre+1 | identical destination choice |
heap order (delivery_time, sender, seq) | identical delivery order |
seq is global monotonic | deterministic tie-break across nodes |
| VC entries sorted by node-id on the wire | byte-identical serialization |
| all integers little-endian | byte-identical on every host |
If any one of these drifts, scripts/cross_test.sh will fail at the
sha256 compare and cmp -l will print the byte offset of the first
divergence.
Files
src/rust/—distfund16crate +simctlbinary.src/go/— modulegithub.com/10xdev/dse/db16+cmd/simctl.src/cpp/—db16_libstatic library +simctlbinary +test_db16.scripts/verify.sh— runs the unit tests for all three.scripts/cross_test.sh— proves the three binaries produce byte-identical event logs for two seeded scenarios.
See docs/ for the longer write-up, and steps/ for the staged
implementation path.