db-20 — Analysis
What is the question?
Given Raft-shaped consensus semantics, can we build a replicated state machine that produces a byte-identical snapshot across three language ecosystems? "Byte-identical" is the strongest possible test of cross-language conformance — strings, integers, map iteration order, and op semantics all have to line up.
Why is this an interesting study?
Raft on its own (db-17) tells you nothing about how a real key/value store is layered on top of it. Production systems (etcd, TiKV, CockroachDB) all answer the same questions:
- What does the leader send to followers? (log entries)
- When does a follower apply an entry? (when its
commit_indexadvances) - How does a partitioned follower catch up? (next-index probe / install snapshot)
- What invariants does the state machine maintain across replicas?
db-20 strips out the network and timer noise so we can focus on questions 2–4 alone. The simplification turns out to be the whole point: once you stop worrying about elections, the integration story fits in ~400 lines of Rust.
Design choices and trade-offs
Snapshot push instead of next-index walk
Raft's real conflict resolution is "decrement next_index, retry". For our purposes that produces the same final state as a one-shot snapshot push, but it forces us to model RPC round-trips. We pick the snapshot push because:
- it converges in a single step (deterministic), and
- it makes
heal()trivial to write — just truncate and replay.
The cost: we cannot study log-divergence scenarios where two leaders both append. That's fine: this lab is single-leader by construction.
State machine is BTreeMap<String, Vec<u8>>
A sorted map gives free deterministic iteration in Rust and C++. Go's
map has randomised iteration, so the Go implementation explicitly
sorts before serialising. This is the single most common source of
non-determinism in cross-language ports — every wire-format-aware
function in the Go code does sort.Slice or sort.Strings.
Op encoding inside LogEntry is not wire-stable
The log is in-memory only; we never serialise LogEntry itself.
Cross-language byte identity is only required at the snapshot
boundary. This separation of "internal" and "wire" structures is
cheap discipline that scales to real systems.
current_term is in the snapshot but is always 1
We expose current_term in the wire format anyway, plumbed through to
all three implementations. This makes it cheap to add elections later
(e.g. as an extension exercise) without having to bump the magic.
Failure-mode catalogue (what we covered, what we did not)
| Failure | db-17 covers? | db-20 covers? |
|---|---|---|
| Single follower crash + catchup | yes | yes (heal) |
| Network partition isolating minority | yes | yes |
| Leader crash + new election | yes | no (fixed leader) |
| Split-brain after partition heal | yes | no (no elections) |
| Log compaction / snapshot install | scratched the surface | no |
| Disk-loss / log truncation | no | no |
| Byzantine behaviour | no | no |
Where to take this next
broader-ideas.mdlists the explicit extensions: linearizable reads, log compaction, multi-region replicas, learner replicas, snapshot install over the wire, gossip-style cluster membership.- The exam in
cross_test.shdoubles as a regression net for any of those extensions — break the snapshot bytes, you break the build.