Databases

PACELC Theorem

● Advanced ⏱ 11 min read database

The PACELC theorem is an extension of the CAP theorem that addresses a critical gap: CAP only describes trade-offs during network partitions, but partitions are rare. What trade-offs do distributed systems make during normal, partition-free operation? PACELC, formalized by Daniel Abadi in 2012, adds the latency vs. consistency dimension that governs everyday distributed database behavior.

The Gap in CAP

CAP tells us: when a partition occurs, choose consistency or availability. This is useful, but it leaves out most of a database’s operating life. Partitions are exceptional events — real systems spend 99.9%+ of their time operating without a partition.

During normal operation, a distributed database still faces a fundamental trade-off: to ensure all replicas see the same data before responding (consistency), the system must wait for network round-trips to coordinate. This coordination adds latency. To respond quickly (low latency), the system can acknowledge writes before all replicas confirm, at the cost of consistency windows.

This latency vs. consistency trade-off is always present, not just during partitions. CAP says nothing about it. PACELC does.

PACELC Explained

PACELC is an acronym that encodes two branches of a conditional:

If Partition (P):
    choose Availability (A) or Consistency (C)
Else (E, no partition):
    choose Latency (L) or Consistency (C)

Every distributed system falls into one of four quadrants:

During Partition	During Normal Operation	Classification	Examples
Prefer Availability	Prefer Latency	PA/EL	Cassandra, DynamoDB (default), Riak, CouchDB
Prefer Consistency	Prefer Consistency	PC/EC	HBase, VoltDB, etcd, Zookeeper
Prefer Consistency	Prefer Latency	PC/EL	MongoDB (default), PNUTS, Megastore
Prefer Availability	Prefer Consistency	PA/EC	Rare; some tunable systems in specific configurations

PACELC: two branches — partition behavior (Availability vs Consistency) and normal operation behavior (Latency vs Consistency)

The PAC Branch (Partition Behavior)

This branch is identical to CAP. During a network partition:

Choose Availability (PA): The system continues accepting reads and writes on all reachable nodes, even though some nodes cannot communicate with each other. Data may diverge. After the partition heals, the system reconciles the diverged state via conflict resolution strategies.

Choose Consistency (PC): The system refuses to serve requests from nodes that cannot reach a quorum of other nodes. Reads and writes on the isolated partition are rejected or blocked until connectivity is restored. No data divergence occurs, but some clients experience unavailability.

The ELC Branch (Normal Operation)

This branch is what PACELC adds beyond CAP. During normal operation (no partition), when a write arrives:

Choose Latency (EL): The system acknowledges the write quickly — often after writing to a single node or a subset of replicas — and replicates asynchronously in the background. The client receives a fast response, but for a brief window, some replicas lag behind. A subsequent read from a lagging replica returns stale data.

Choose Consistency (EC): The system requires acknowledgment from all replicas (or a quorum) before responding to the client. Every node has the new value before the write is declared successful. Subsequent reads from any node return the latest value, but the write latency includes the network round-trip to every required replica.

The latency difference between EL and EC can be significant. In a database with replicas across two data centers separated by 50ms of network latency, an EC write takes at least 50ms more than an EL write. Multiply by millions of writes per day and the throughput and tail latency impact is substantial.

System Classifications

PA/EL Systems — Availability and Low Latency First

PA/EL systems prioritize being fast and always available, accepting that data may be transiently inconsistent.

Cassandra (default configuration):

PA: During a partition, Cassandra continues accepting reads and writes on both sides. Diverged data is reconciled via last-write-wins (LWW) using client-provided timestamps, or read-repair during reads.
EL: By default, a write returns after the local node writes to its commit log and memtable — before any replica confirms. Replication happens asynchronously. A read from a replica that hasn’t received the write yet returns the old value.

DynamoDB (eventual consistency mode):

PA: Accepts writes on any partition during a split, using multi-version conflict resolution with vector clocks.
EL: Eventually consistent reads (the default) may return stale data. The response latency is low because the system returns data from the nearest replica without waiting for cross-region synchronization.

PA/EL systems are ideal for: high-write workloads (IoT telemetry, event logging), globally distributed applications where cross-region write latency would be prohibitive, and domains where brief stale reads are tolerable (social feeds, recommendation engines, analytics counters).

PC/EC Systems — Consistency Always

PC/EC systems favor consistency in all situations, accepting higher latency and reduced availability during partitions.

HBase:

PC: Built on HDFS with a single active master (via ZooKeeper leader election). During a partition that isolates the master, writes are unavailable. Region servers that cannot communicate with the master stop accepting writes.
EC: Writes go through the RegionServer WAL and are acknowledged synchronously. Reads are served from the RegionServer in charge of that row range — always consistent because only one RegionServer owns each row range at a time.

etcd and Zookeeper:

PC: Use Raft (etcd) and ZAB (ZooKeeper) consensus. A write requires a majority quorum. During a partition that isolates a minority, that minority refuses writes.
EC: Writes are linearizable by construction — they go through the Raft log and are applied to the state machine in order. Every committed entry is visible to subsequent reads. The latency cost is the Raft round-trip to reach quorum.

PC/EC systems are ideal for: coordination services (distributed locks, leader election, service discovery), configuration management, financial ledgers, and any use case where reading stale data causes incorrect behavior.

PC/EL Systems — Consistent Under Partition, Fast Otherwise

PC/EL is an interesting middle ground: during a partition, the system sacrifices availability to maintain consistency; but during normal operation, it prioritizes low latency over strict consistency for reads.

MongoDB (default configuration with replica sets):

PC: During a partition that isolates the primary, an election occurs. Until a new primary is elected (typically 10–30 seconds), writes are unavailable. MongoDB does not allow two primaries — the isolated primary steps down.
EL: By default, reads can be served from secondaries (readPreference: secondary), which may be slightly behind the primary due to asynchronous replication. This provides low-latency reads at the cost of potential stale data. With readConcern: majority, reads block until the data has been replicated to a majority of nodes — trading latency for consistency.

💡

MongoDB’s Tunable Position

MongoDB’s PACELC classification shifts based on configuration. With writeConcern: majority and readConcern: majority, it moves toward PC/EC. With writeConcern: 1 and readPreference: secondary, it operates closer to PC/EL or even PA/EL. This flexibility is powerful but requires deliberate configuration — the defaults do not provide the strongest consistency guarantees.

Tunable Systems

Many modern databases do not occupy a fixed PACELC position — they allow tuning per operation. This is one of PACELC’s most practical insights: you don’t have to make a global choice. You can choose different trade-offs for different operations based on their consistency requirements.

Cassandra’s consistency levels:

ONE: Returns after one replica responds. Fastest, least consistent.
QUORUM: Returns after majority of replicas respond. Middle ground.
ALL: Returns after all replicas respond. Slowest, most consistent.
LOCAL_QUORUM: Returns after quorum within the local data center only — avoids cross-region latency for most operations.

You can also achieve strong consistency in Cassandra by setting both write and read consistency to QUORUM: since writes must be confirmed by a majority and reads must consult a majority, the intersection guarantees the reader sees the latest write.

DynamoDB’s strongly consistent reads: By default, DynamoDB reads are eventually consistent (EL). Setting ConsistentRead: true makes the read contact the leader replica and wait for the most recent data — moving to EC at the cost of slightly higher latency and double the read capacity units.

✅

The Key PACELC Insight

During normal operation (which is almost always), the trade-off is latency vs. consistency — not availability vs. consistency. This means that even for a system you’ve decided should be “consistent,” you need to decide: how much latency is acceptable to achieve that consistency? Synchronous replication to a replica 200ms away costs 200ms per write. Is that acceptable for your use case? PACELC forces you to quantify this rather than treating consistency as free.

Design Considerations

Partition behavior is your disaster plan; ELC is your daily reality. Spend most of your design time on the ELC branch — it governs every operation. The PAC branch matters, but partitions are rare and usually brief. The ELC trade-off is always active.
Latency budgets drive ELC choices. If your SLA requires p99 write latency under 50ms, and your nearest replica is 30ms away, synchronous EC replication consumes 60% of your budget for replication alone. You may need to accept EL replication and engineer application-level compensation.
Separate read and write consistency requirements. Writes often need stronger consistency (to prevent data loss) than reads (where brief staleness is tolerable). Tune them independently where your database supports it.
Measure, don’t guess. The latency difference between EL and EC depends on network topology, replication factor, and load. Benchmark your actual database configuration before committing to an architecture.
Replication factor affects the latency cost of EC. With 3 replicas, QUORUM consistency requires 2 acknowledgments. With 5 replicas, QUORUM requires 3. Larger replication factors increase EC latency in proportion to the slowest required replica.

✅

In System Design Interviews

Invoke PACELC when you need to justify database configuration, not just database selection. “We use Cassandra, which is PA/EL by default. For the payment confirmation record we write at QUORUM (EC), accepting the higher latency because consistency is non-negotiable. For the activity feed read, we use ONE (EL) — the 50ms latency savings outweigh the risk of a user seeing a post that’s 200ms stale.” This shows you understand that the same database can exhibit different PACELC positions depending on how it is used.