Leslie Lamport, The Part-Time Parliament, ACM TOCS 1998. The
original Paxos paper. Famously hard to read (the Parliament of
Paxos allegory hides the algorithm). The mathematics in §2 is the
spec; the rest is narrative.
https://lamport.azurewebsites.net/pubs/lamport-paxos.pdf
Leslie Lamport, Paxos Made Simple, ACM SIGACT News 2001. The
paper to read first. The whole algorithm — single-decree and the
Multi-Paxos extension — is on four pages. The P1a / P1b / P2a /
P2b / P2c invariants in this paper are the ones whose operational
forms the simulator enforces.
https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
Tushar Chandra, Robert Griesemer, Joshua Redstone, Paxos Made
Live — An Engineering Perspective, PODC 2007. Google's Chubby
team's writeup of what it took to turn the algorithm into a
production system: leader leases, snapshots, group membership,
disk corruption, the works. This lab implements roughly §2–§3 of
that paper.
https://research.google/pubs/paxos-made-live-an-engineering-perspective/
Robbert van Renesse & Deniz Altinbuken, Paxos Made Moderately
Complex, ACM CSUR 2015. The most readable end-to-end derivation
of Multi-Paxos. Pseudocode in §3 maps almost line-for-line onto
this lab's start_election, become_leader, try_decide.
https://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf
Heidi Howard, Distributed Consensus Revised, PhD dissertation,
Cambridge 2019 (also A Generalised Solution to Distributed
Consensus, 2020). Reframes Paxos as one point in a design space
parameterised by quorum-intersection requirements; explains why
Flexible Paxos works and how Raft, EPaxos, and Vertical Paxos all
fit into the same picture.
https://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-935.pdf
Leslie Lamport, Fast Paxos, Distributed Computing 2006. Allows
a single round-trip happy path when only one proposer is active,
at the cost of a 3f+1 quorum on the fast path. EPaxos
generalises this.
Iulian Moraru, David Andersen, Michael Kaminsky, There Is More
Consensus in Egalitarian Parliaments (EPaxos), SOSP 2013. Drops
the leader entirely; each command picks its own dependency graph.
Production-relevant in geo-distributed systems where any-leader
latency is uneven.
Lamport, Malkhi, Zhou, Vertical Paxos, PODC 2009. Decouples
reconfiguration from the consensus protocol — the answer to
"how do you change the acceptor set without stopping the world".
Lamport, Generalized Paxos, MSR-TR-2005-33. Lets commutative
commands be ordered concurrently; precursor to EPaxos.
etcd/raft (Go) — included for comparison; etcd uses Raft, but
its testing harness (raftpb deterministic replay) is the spirit
of this lab's cross-language test.
https://github.com/etcd-io/raft
Apache ZooKeeper (Java) — ZAB is a Paxos-family protocol with
primary order; useful counterpoint when reading db-19.
https://github.com/apache/zookeeper
Apache Cassandra Lightweight Transactions — production
Multi-Paxos in the read/write path. Cassandra picks a fresh
ballot per LWT, so it pays the Phase-1 cost every time and skips
the Multi-Paxos amortization. Worth reading for what not to do
if you care about per-decree latency.
https://github.com/apache/cassandra/tree/trunk/src/java/org/apache/cassandra/service/paxos
TigerBeetle (Zig) — Viewstamped Replication, a near-Paxos
cousin. Deterministic simulator that does almost exactly what
this lab's cross_test.sh does, but in one language with
thousands of seeds.
https://github.com/tigerbeetledb/tigerbeetle
Diego Ongaro & John Ousterhout, In Search of an Understandable
Consensus Algorithm, USENIX ATC 2014. Read alongside db-17.
Section 10 (related work) is the cleanest published comparison of
Raft to Paxos.
https://raft.github.io/raft.pdf