Glossary

A unified glossary of terms used across all labs. Terms are grouped by domain.

Storage & I/O

TermDefinition
PageThe unit of I/O between disk and memory. Usually 4 KiB (matches OS page size) but databases often use 4–32 KiB.
BlockAn SSTable's I/O unit (LevelDB default 4 KiB). Distinct from a B-tree "page" — both are I/O units but for different engines.
mmapMap a file into process address space. Reads happen via page faults; writes via dirty pages flushed by the kernel.
pread/pwritePositional read/write syscalls. Explicit offset, no shared file pointer. Predictable cost, no page-fault stalls.
O_DIRECTOpen flag (Linux) that bypasses the page cache. Requires aligned buffers, aligned offsets, aligned sizes.
fsyncForce file data + metadata to stable storage. Blocks until disk acknowledges. Often the slowest syscall in a database.
fdatasyncLike fsync but skips non-essential metadata. Faster on most filesystems.
Write amplification (WA)Bytes physically written / bytes logically written. SSDs have hardware WA; LSM-trees have algorithmic WA from compaction.
Read amplification (RA)Bytes physically read / bytes logically read. LSM-trees suffer from RA due to checking multiple levels.
Space amplificationBytes on disk / bytes of live data. LSMs have space amp from stale data awaiting compaction.
EndiannessByte order. Little-endian (x86, ARM default): least-significant byte first. Big-endian: network byte order.
AlignmentMemory address being a multiple of N. Required for O_DIRECT (usually 512 B or 4 KiB) and SIMD ops.
io_uringLinux async I/O API (≥ 5.1). Two ring buffers (SQ/CQ) shared between kernel and user space.
DMADirect Memory Access — disk controller writes directly to RAM without CPU involvement.

Hardware

TermDefinition
HDD seek time~5–10 ms for random reads (head movement + rotational latency). ~150 MB/s sequential.
SATA SSD~100 μs random read latency, ~500 MB/s sequential, ~80K IOPS.
NVMe SSD~50–100 μs random read latency, ~3–7 GB/s sequential, ~500K–1M IOPS. Multiple hardware queues.
Cache lineCPU cache unit, almost always 64 bytes. Data-structure layout for cache locality matters.
NUMANon-Uniform Memory Access — CPU sockets have local RAM; cross-socket access is slower.
Wear levelingSSD firmware spreads writes across blocks to even out flash wear. Causes hardware write amplification.

Data Structures

TermDefinition
Skip listProbabilistic balanced structure with O(log n) ops and lock-free-friendly properties. Used in LevelDB MemTable.
B-TreeSelf-balancing m-ary tree. Internal nodes store keys + values + child pointers. Used for indexes.
B+-TreeB-Tree variant where all values live in leaf nodes; internal nodes are pure routing. Used for tables in SQLite.
LSM-TreeLog-Structured Merge-Tree. In-memory MemTable + on-disk sorted runs (SSTables), merged via compaction.
Bloom filterProbabilistic set membership; no false negatives, tunable false positive rate. Used to skip SSTable lookups.
ARTAdaptive Radix Tree — modern in-memory index alternative to B-Trees, used by HyPer, DuckDB.

Consensus

TermDefinition
QuorumSubset of nodes whose agreement is required. Typically ⌊N/2⌋ + 1 for majority quorum.
Term / EpochMonotonically increasing identifier for a leadership period (Raft term, ZAB epoch, Paxos ballot).
Log indexPosition of an entry in the replicated log. Indices are monotonic and dense.
Commit indexThe largest log index known to be safely replicated to a quorum.
LinearizabilityStrongest consistency: operations appear to take effect atomically at some point between their invocation and response.
Sequential consistencyAll processes agree on a single global order, but the order need not match real-time.
Eventual consistencyIf updates stop, all replicas eventually agree. No real-time guarantees.
CAP theoremUnder a network partition, you must choose Consistency or Availability. Partition tolerance is non-negotiable.
FLP impossibilityNo deterministic asynchronous consensus protocol can guarantee progress with even one crash failure.
Lamport timestampScalar logical clock: L(a) < L(b) if a happened-before b. Cannot detect concurrency.
Vector clockPer-node vector. VC(a) < VC(b) iff every component is ≤. Detects concurrent events.
HLCHybrid Logical Clock: combines physical time with a logical counter; bounded skew from real time.

Transactions

TermDefinition
ACIDAtomicity, Consistency, Isolation, Durability — properties a transaction must satisfy.
Isolation levelREAD UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE. Each rules out more anomalies.
Dirty readReading data written by an uncommitted transaction.
Non-repeatable readReading the same row twice in one tx and getting different values.
Phantom readA range query returns different rows when re-run within one tx.
MVCCMulti-Version Concurrency Control — writes create new versions; readers see a snapshot.
2PLTwo-Phase Locking — acquire locks in a growing phase, release in a shrinking phase. Guarantees serializability.
2PCTwo-Phase Commit — distributed transaction protocol: prepare phase, then commit/abort. Blocking on coordinator failure.

SQL Engine

TermDefinition
VDBEVirtual Database Engine — SQLite's bytecode VM that executes compiled SQL.
Prepared statementA parsed and compiled SQL statement, reusable with different parameters.
Cardinality estimationPredicting how many rows a query operator will produce. Core to the query planner.
SelectivityFraction of rows that satisfy a predicate. Low selectivity ⇒ index scan preferred.
Covering indexAn index that contains all columns needed by a query, so the table doesn't need to be touched.

Operational

TermDefinition
SnapshotA consistent point-in-time view of data. Used for backups, MVCC reads, Raft log compaction.
CheckpointOperation that flushes in-memory state to disk so recovery has less log to replay.
CompactionBackground process that merges sorted files (LSM) or reclaims fragmented space (B-tree).
YCSBYahoo Cloud Serving Benchmark — standard KV workload suite (A–F). Used in db-22.
JepsenTest framework for distributed systems correctness; injects partitions/clock skew. Inspires our consensus tests.