References — db-23 capstone
Replication and consensus
- Diego Ongaro & John Ousterhout. In Search of an Understandable Consensus Algorithm (Extended Version). ATC 2014. The Raft paper — the leader/log/commit-index model used by this lab is a direct simplification of it.
- Leslie Lamport. Paxos Made Simple. 2001. The original majority-quorum log-replication algorithm.
- Flavio Junqueira, Benjamin Reed, Marco Serafini. ZAB: High-performance broadcast for primary-backup systems. DSN 2011. Used by ZooKeeper; closest in spirit to the leader-only single-quorum model here.
Theory
- Fischer, Lynch, Paterson. Impossibility of Distributed Consensus with One Faulty Process. JACM 1985. Why deterministic consensus needs failure detectors / partial synchrony.
- Eric Brewer. Towards Robust Distributed Systems. PODC 2000 keynote (CAP conjecture). Gilbert & Lynch later proved it.
- Seth Gilbert & Nancy Lynch. Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. SIGACT 2002.
Practitioner material
- MIT 6.824 Distributed Systems lectures (esp. lectures 5–8 on Raft).
- Martin Kleppmann. Designing Data-Intensive Applications. O'Reilly 2017. Chs. 5, 8, 9 on replication, consistency, and consensus.
- Kyle Kingsbury. Jepsen reports (https://jepsen.io). Practical examples of how real systems violate the guarantees their READMEs claim.
Isolation testing
- Peter Bailis et al. Hermitage — concrete tests that expose what isolation levels really mean (https://github.com/ept/hermitage).
What this lab does not model
- Leader election (we hardcode node 0 as leader forever).
- Log truncation / divergent suffixes (we use synchronous in-process replication, so followers never have entries the leader lacks).
- Membership changes, log compaction, snapshots, network partitions beyond a single follower being marked down.
Those are the natural follow-on projects after this capstone — see docs/broader-ideas.md.