CAP Theorem
The CAP Theorem states that a distributed data store can guarantee at most two of three properties: Consistency, Availability, and Partition Tolerance. Proposed by Eric Brewer in 2000 and formally proven by Gilbert and Lynch in 2002, CAP is perhaps the most quoted theorem in distributed systems. It is also one of the most misunderstood. Knowing what CAP actually says — and what it doesn’t — is fundamental to reasoning about distributed database design.
What Is the CAP Theorem?
CAP defines three properties of distributed systems:
- C — Consistency: Every read receives the most recent write or an error.
- A — Availability: Every request receives a response (not an error), though the response may not reflect the most recent write.
- P — Partition Tolerance: The system continues operating despite an arbitrary number of messages being dropped or delayed by the network between nodes.
The theorem states: in the presence of a network partition, a distributed system must choose between consistency and availability. It cannot guarantee both simultaneously.
Consistency
In CAP, consistency means linearizability: all nodes see the same data at the same time. After a write completes, any subsequent read from any node returns that written value. The system appears to the client as a single, coherent data store.
This is a stronger definition than ACID consistency. ACID consistency is about database constraints; CAP consistency is about the ordering of reads and writes across distributed nodes.
Achieving linearizability requires coordination: before a write can be acknowledged, all nodes (or a quorum of nodes) must agree on the new value. This coordination takes time and fails if any participating node is unreachable.
Availability
In CAP, availability means every request to a non-failing node returns a response — never a timeout or error. The response does not have to be the most recent value; it just must not be an error.
This is a strict definition. A system that occasionally returns errors during partition is not fully available under the CAP definition. In practice, availability is measured probabilistically (SLAs, uptime percentages) rather than as an absolute guarantee.
Partition Tolerance
A network partition occurs when nodes in a distributed system cannot communicate with each other due to network failure. Messages are lost, delayed, or reordered. Partition tolerance means the system continues operating even when some nodes are disconnected from others.
In any real distributed system, partition tolerance is mandatory. Networks fail. Switches misbehave. Data centers lose connectivity. If your system cannot tolerate a partition, it is not a distributed system — it is a single node. This makes CAP’s real choice not “pick 2 of 3” but rather “during a partition, choose C or A.”
Why Only Two of Three?
Suppose two nodes (Node A and Node B) are separated by a network partition. A client writes a new value to Node A. Now a different client reads from Node B:
- To maintain Consistency: Node B must either have the updated value (which it cannot get from Node A due to the partition) or return an error. The system chooses to return an error — sacrificing Availability.
- To maintain Availability: Node B must return a response — but it only has the old value. It serves the stale data, sacrificing Consistency.
There is no third option. During a partition, the system must choose which guarantee to break.
CP Systems
CP systems sacrifice availability during a partition in order to preserve consistency. When a partition occurs, rather than serve potentially stale data, the system refuses requests from the isolated partition until connectivity is restored.
Examples and how they achieve CP:
- HBase, Zookeeper, etcd: Use Raft or ZAB consensus. A write requires a quorum of nodes to acknowledge before committing. If the quorum is unreachable, the system rejects writes (and optionally reads) rather than risk serving stale data.
- Google Spanner / CockroachDB: Distributed SQL databases that use consensus (Paxos / Raft) and TrueTime (Spanner) to achieve strong consistency across global nodes. Reads block until the node is certain its data is up to date.
- MongoDB (with appropriate write concern): By default uses a replica set where writes go to the primary. During a partition that isolates the primary, an election occurs. Until a new primary is elected, writes are unavailable. Reads from the primary are consistent; reads from secondaries may be stale.
CP systems are appropriate when: correctness is mandatory (financial transactions, distributed locks, leader election), data divergence would cause real harm, and availability degradation during rare partition events is tolerable.
AP Systems
AP systems sacrifice consistency during a partition in order to remain available. Every node continues responding to requests, even if it means serving stale or conflicting data. After the partition heals, the system reconciles diverged state.
Examples and how they achieve AP:
- Cassandra: Uses a leaderless, multi-primary design. Every node can accept reads and writes. During a partition, both sides continue serving requests, potentially with different data. After the partition heals, Cassandra uses last-write-wins (by timestamp) or application-defined merge to reconcile conflicts.
- DynamoDB: Eventually consistent reads return data from any replica. Even during a partition, DynamoDB continues accepting reads and writes on both sides of the partition, reconciling afterward via vector clocks and last-write-wins.
- CouchDB: Designed for offline-first scenarios. Two devices can diverge independently (no network) and sync when reconnected, with application-defined conflict resolution.
AP systems are appropriate when: availability is critical (users must always get a response), the application can tolerate stale data, the domain supports eventual convergence (social feeds, shopping carts, DNS), and write conflicts can be resolved automatically or are acceptable.
Amazon’s Dynamo paper (2007) famously used the shopping cart as an AP example. If a partition occurs and a user adds items to their cart, the cart service should accept the add rather than error. After the partition heals, the two diverged cart states are merged (union of items). The worst outcome — a customer sees an item they deleted reappear — is far better than the alternative: “Sorry, your cart is unavailable during a network event.”
CA Systems (Single-Node)
The CA classification (Consistent and Available, no Partition Tolerance) describes systems that are not distributed. A single-node PostgreSQL database is CA: it is consistent (all reads see the latest write) and available (every request gets a response). But if the server goes down, the system is completely unavailable — there is no other node to serve from. It has no partition tolerance because there is no second node to partition from.
In practice, “CA” is not a real option for distributed systems. Any multi-node system must tolerate at least some network unreliability, which means choosing between C and A when a partition occurs.
Real-World Implications
The binary framing of CAP is useful for reasoning but obscures real-world nuance:
Partitions are rare but inevitable. In a well-operated data center, hard partitions (total disconnection) are uncommon. Most “consistency vs. availability” decisions are actually about latency vs. consistency under normal operation — which CAP doesn’t model (the PACELC theorem addresses this).
Consistency is not binary. Systems offer a spectrum from linearizability (strongest) through sequential consistency, causal consistency, read-your-writes consistency, monotonic read consistency, to eventual consistency (weakest). The choice is not “fully consistent or fully inconsistent” but which consistency guarantees match the application’s requirements.
Availability is not binary. A system might return errors for 0.1% of requests during a partition (highly available) versus 100% (unavailable). CAP treats availability as absolute, but practical systems exist on a spectrum.
Tunable consistency. Many systems let you tune per-operation. In Cassandra, you can write with QUORUM (consistent) and read with ONE (fast, possibly stale) — effectively choosing CP for writes and AP for reads on the same cluster.
Limitations of CAP
- CAP ignores latency. A system can be consistent and available during normal operation, but with high latency. CAP says nothing about performance when there is no partition. The PACELC theorem fills this gap.
- CAP only models one failure mode. Partitions are one type of failure. CAP doesn’t model: node crashes (availability vs. durability), Byzantine failures (malicious or incorrect nodes), or gray failures (partially degraded nodes).
- The “2 of 3” framing is misleading. Since partition tolerance is mandatory in any distributed system, the real theorem is: during a partition, choose consistency or availability. The triangle metaphor suggests you choose two at rest, but the choice is only forced during failures.
Don’t mechanically classify databases as “CP” or “AP” and leave it there. Most databases are tunable and exhibit different behaviors depending on configuration, write concern, and read preference. CAP gives you the vocabulary to have the right conversation — it doesn’t give you the answer.
Design Considerations
- Start with your consistency requirement. What is the cost of serving stale data? In a financial ledger, serving a stale balance can cause overdrafts — CP is required. In a social feed, serving a post that’s 2 seconds old is invisible to the user — AP is fine.
- Design for the partition failure mode. How should your system behave when nodes cannot communicate? Return errors? Serve stale data? Refuse writes but allow reads? Define this explicitly before a partition surprises you in production.
- Understand your database’s default behavior. Many databases default to availability (serving stale reads) even when configured for “consistency”. Read the documentation carefully. Test partition behavior in your staging environment.
- Use per-operation consistency tuning. For critical paths (payment, inventory decrement), use stronger consistency guarantees. For non-critical reads (catalog browsing, social counts), relax consistency to gain availability and performance.
Don’t just name-drop CAP — apply it. “For the user profile service, we need consistency: if a user updates their payment method, every node must immediately serve the new value. We’ll use a primary-replica PostgreSQL cluster with synchronous replication — CP, accepting that during a primary failure reads block until a new primary is elected. For the activity feed, eventual consistency is fine: we use Cassandra, accepting that users in different regions see the feed with up to a 1-second lag.”