Databases

CAP Theorem

● Advanced ⏱ 13 min read database

The CAP Theorem states that a distributed data store can guarantee at most two of three properties: Consistency, Availability, and Partition Tolerance. Proposed by Eric Brewer in 2000 and formally proven by Gilbert and Lynch in 2002, CAP is perhaps the most quoted theorem in distributed systems. It is also one of the most misunderstood. Knowing what CAP actually says — and what it doesn’t — is fundamental to reasoning about distributed database design.

What Is the CAP Theorem?

CAP defines three properties of distributed systems:

The theorem states: in the presence of a network partition, a distributed system must choose between consistency and availability. It cannot guarantee both simultaneously.

CAP triangle: in the presence of a partition, a distributed system must choose between consistency and availability

Consistency

In CAP, consistency means linearizability: all nodes see the same data at the same time. After a write completes, any subsequent read from any node returns that written value. The system appears to the client as a single, coherent data store.

This is a stronger definition than ACID consistency. ACID consistency is about database constraints; CAP consistency is about the ordering of reads and writes across distributed nodes.

Achieving linearizability requires coordination: before a write can be acknowledged, all nodes (or a quorum of nodes) must agree on the new value. This coordination takes time and fails if any participating node is unreachable.

Availability

In CAP, availability means every request to a non-failing node returns a response — never a timeout or error. The response does not have to be the most recent value; it just must not be an error.

This is a strict definition. A system that occasionally returns errors during partition is not fully available under the CAP definition. In practice, availability is measured probabilistically (SLAs, uptime percentages) rather than as an absolute guarantee.

Partition Tolerance

A network partition occurs when nodes in a distributed system cannot communicate with each other due to network failure. Messages are lost, delayed, or reordered. Partition tolerance means the system continues operating even when some nodes are disconnected from others.

In any real distributed system, partition tolerance is mandatory. Networks fail. Switches misbehave. Data centers lose connectivity. If your system cannot tolerate a partition, it is not a distributed system — it is a single node. This makes CAP’s real choice not “pick 2 of 3” but rather “during a partition, choose C or A.”

Why Only Two of Three?

Suppose two nodes (Node A and Node B) are separated by a network partition. A client writes a new value to Node A. Now a different client reads from Node B:

There is no third option. During a partition, the system must choose which guarantee to break.

CP Systems

CP systems sacrifice availability during a partition in order to preserve consistency. When a partition occurs, rather than serve potentially stale data, the system refuses requests from the isolated partition until connectivity is restored.

Examples and how they achieve CP:

CP systems are appropriate when: correctness is mandatory (financial transactions, distributed locks, leader election), data divergence would cause real harm, and availability degradation during rare partition events is tolerable.

AP Systems

AP systems sacrifice consistency during a partition in order to remain available. Every node continues responding to requests, even if it means serving stale or conflicting data. After the partition heals, the system reconciles diverged state.

Examples and how they achieve AP:

AP systems are appropriate when: availability is critical (users must always get a response), the application can tolerate stale data, the domain supports eventual convergence (social feeds, shopping carts, DNS), and write conflicts can be resolved automatically or are acceptable.

💡
The Shopping Cart Example

Amazon’s Dynamo paper (2007) famously used the shopping cart as an AP example. If a partition occurs and a user adds items to their cart, the cart service should accept the add rather than error. After the partition heals, the two diverged cart states are merged (union of items). The worst outcome — a customer sees an item they deleted reappear — is far better than the alternative: “Sorry, your cart is unavailable during a network event.”

CA Systems (Single-Node)

The CA classification (Consistent and Available, no Partition Tolerance) describes systems that are not distributed. A single-node PostgreSQL database is CA: it is consistent (all reads see the latest write) and available (every request gets a response). But if the server goes down, the system is completely unavailable — there is no other node to serve from. It has no partition tolerance because there is no second node to partition from.

In practice, “CA” is not a real option for distributed systems. Any multi-node system must tolerate at least some network unreliability, which means choosing between C and A when a partition occurs.

Real-World Implications

The binary framing of CAP is useful for reasoning but obscures real-world nuance:

Partitions are rare but inevitable. In a well-operated data center, hard partitions (total disconnection) are uncommon. Most “consistency vs. availability” decisions are actually about latency vs. consistency under normal operation — which CAP doesn’t model (the PACELC theorem addresses this).

Consistency is not binary. Systems offer a spectrum from linearizability (strongest) through sequential consistency, causal consistency, read-your-writes consistency, monotonic read consistency, to eventual consistency (weakest). The choice is not “fully consistent or fully inconsistent” but which consistency guarantees match the application’s requirements.

Availability is not binary. A system might return errors for 0.1% of requests during a partition (highly available) versus 100% (unavailable). CAP treats availability as absolute, but practical systems exist on a spectrum.

Tunable consistency. Many systems let you tune per-operation. In Cassandra, you can write with QUORUM (consistent) and read with ONE (fast, possibly stale) — effectively choosing CP for writes and AP for reads on the same cluster.

Limitations of CAP

⚠️
CAP Is a Framework, Not a Formula

Don’t mechanically classify databases as “CP” or “AP” and leave it there. Most databases are tunable and exhibit different behaviors depending on configuration, write concern, and read preference. CAP gives you the vocabulary to have the right conversation — it doesn’t give you the answer.

Design Considerations

In System Design Interviews

Don’t just name-drop CAP — apply it. “For the user profile service, we need consistency: if a user updates their payment method, every node must immediately serve the new value. We’ll use a primary-replica PostgreSQL cluster with synchronous replication — CP, accepting that during a primary failure reads block until a new primary is elected. For the activity feed, eventual consistency is fine: we use Cassandra, accepting that users in different regions see the feed with up to a 1-second lag.”