db-19 — References

Primary sources

  • Benjamin Reed and Flavio P. Junqueira, A simple totally ordered broadcast protocol, LADIS 2008. The original ZAB paper — short, workshop-length, and the only place that describes the algorithm in the exact "Phase 0 / 1 / 2 / 3" shape it took inside ZooKeeper. https://dl.acm.org/doi/10.1145/1529974.1529978
  • Flavio P. Junqueira, Benjamin C. Reed, and Marco Serafini, Zab: High-performance broadcast for primary-backup systems, DSN 2011. The peer-reviewed, formal treatment. Defines the primary order property, gives the proof obligations, and folds the original Phase 0 into Phase 1. This is the paper to cite when arguing the correctness of any particular handshake decision. https://marcoserafini.github.io/papers/zab.pdf
  • Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed, ZooKeeper: Wait-free coordination for Internet-scale systems, USENIX ATC 2010. Describes the system (znodes, sessions, watches, the wait-free API) that ZAB exists to support. Useful for understanding why ZAB was designed with primary order rather than as a generic consensus library. https://www.usenix.org/legacy/event/atc10/tech/full_papers/Hunt.pdf

Implementations to read alongside

Determinism and simulation

  • db-16's references on FoundationDB simulation testing and TigerBeetle apply verbatim here. The (delivery_time, sender, seq) heap and the splitmix64-seeded jitter are the same discipline.
  • The ZooKeeper test suite (zookeeper/src/java/test/.../quorum/) uses scripted scenarios but is not deterministic in the cross-language sense this lab aims for. Worth reading as an example of how the production team tests the algorithm.

Background reading worth doing

  • Heidi Howard, Distributed consensus revised, Cambridge PhD dissertation, 2019; the 2020 survey A Generalised Solution to Distributed Consensus unifies Paxos, Raft, and ZAB under a single quorum-intersection framework. Helps see ZAB as one point in a design space rather than as an oddball. https://www.cl.cam.ac.uk/~hh360/
  • Leslie Lamport, Paxos Made Simple, 2001. The contrast with ZAB is illuminating: Paxos picks a value per slot; ZAB streams a totally ordered log under a primary. https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
  • Diego Ongaro and John Ousterhout, In Search of an Understandable Consensus Algorithm, USENIX ATC 2014 — the Raft paper. Read this before the ZAB papers if you have not already; the comparison in db-17's CONCEPTS.md is the recommended on-ramp. https://raft.github.io/raft.pdf
  • André Medeiros, ZooKeeper's Atomic Broadcast Protocol: Theory and Practice, Aalto University seminar notes, 2012. A 14-page treatment of ZAB-vs-implementation gotchas; useful when the papers feel terse. https://www.tcs.hut.fi/Studies/T-79.5001/reports/2012-deSouzaMedeiros.pdf

Cross-lab dependencies

  • Upstream:
    • db-16 — distributed-fundamentals: Lamport/VC and the deterministic simulator harness whose discipline this lab inherits wholesale.
    • db-17 — Raft: same simulator skeleton; reading Raft first makes ZAB's discovery/sync handshake feel like the explicit version of Raft's implicit AppendEntries consistency check.
    • db-18 — Paxos: the other consensus reference point; ZAB's (epoch, counter) is the streaming-log analog of Paxos's (ballot, slot) numbering.
  • Downstream:
    • db-20 — Distributed KV: wraps a consensus engine (could be Raft, ZAB, or Paxos from this track) around a key-value state machine.
    • db-21 — Storage-engine-advanced: snapshots and log compaction on top of the canonical history laid down here.
    • db-23 — Capstone: composes the simulator harness across multiple replicated shards.