db-20 — Observation

Frozen exam hashes

ScenarioArgssha256
A--seed 42 --ops 500 --keys 32 --scenario default1febc1252f87f873c315526e9d9c78a622131d700dccca84a6e089244930252b
B--seed 7 --ops 2000 --keys 128 --scenario partition272af5b41b729896a7195a6ea72d19111a96a50b29d5d4cdfaac03a058e1a2dc

These three statements are all asserted by scripts/cross_test.sh:

  1. Rust, Go, and C++ each produce the hash above for the given scenario.
  2. All five replicas in the cluster produce the identical snapshot after the scenario completes (TestScenarioBReplicasConverge in Go, test_scenario_b_frozen in C++, scenario_b_partitioned_replicas_converge_after_heal in Rust).
  3. The cluster has no live partitions when the driver returns.

Quantitative observations

metricscenario Ascenario B
ops5002000
keys parameter32128
committed-on-leader entries5002000
approximate Put / Del fraction3/4 vs 1/43/4 vs 1/4
live keys in final state (approx)< 32< 128

Every committed entry executes exactly once on the leader; partitioned followers see all of them after heal() because truncate_and_replay replays the entire log.

Behavioural observations

  • Convergence is deterministic. No timeouts, no clocks. Running the workload twice with the same seed always yields the same bytes (workload_determinism in Rust, TestWorkloadDeterminism in Go, test_workload_deterministic in C++).
  • Sub-quorum proposals leave uncommitted tails. The leader's last_idx advances every propose, but commit_index only advances on quorum acks. This is observable in the test sub_quorum_does_not_commit — the leader sees last_idx == 1 and commit_index == 0.
  • Heal is a one-shot. In scenario B, after the heal call at ops*3/4, all five replicas have byte-identical state machines. There is no period of eventual consistency — convergence is instantaneous and deterministic by construction.
  • Delete is real. Del removes the key from the state machine. A later Put reusing the same key is a fresh entry, not a "revive". This is asserted by test_del_removes_key (C++) and friends.

Performance notes (this lab is not a perf study)

The reference implementations are single-threaded, in-memory, with no I/O. Scenario B runs in ~5 ms in Release Rust on an M-series Mac; the snapshot push during heal() is O(log_size) per partitioned follower, which is the dominant cost.

The lab deliberately optimises for clarity and byte-identity, not throughput. Real systems (db-09 leveldb-complete is a good adjacent reference) batch and pipeline replication; here every propose is synchronous.