The CAP theorem doesn’t actually say you have to choose between Consistency, Availability, or Partition Tolerance; it says you can’t have all three simultaneously in a distributed system when a network partition occurs.
Let’s see how this plays out. Imagine a simple distributed key-value store. Two nodes, Node A and Node B, both hold the value {"key": "value1"} for a specific key.
Client -> Node A (Read {"key": "value1"})
Client -> Node B (Read {"key": "value1"})
Now, a network partition happens. Node A can’t talk to Node B, and Node B can’t talk to Node A.
If a client tries to write a new value, say {"key": "value2"}, to this system:
Scenario 1: Prioritizing Consistency and Partition Tolerance (CP)
The system needs to ensure that if a write happens, all subsequent reads see that write. If there’s a partition, the system cannot guarantee this across all nodes. So, to maintain consistency, it will sacrifice availability.
Client -> Node A (Write {"key": "value2"})
[Network Partition between A and B]
Client -> Node B (Read)
// Node B cannot reach Node A to confirm the write.
// To remain consistent, Node B must refuse the read.
// Node B returns an error.
In this CP system, Node B is unavailable. The system could have served a stale read (returning {"key": "value1"}), but it chose not to, prioritizing consistency.
Scenario 2: Prioritizing Availability and Partition Tolerance (AP)
The system will try to serve requests even if it can’t guarantee consistency across all nodes due to a partition.
Client -> Node A (Write {"key": "value2"})
[Network Partition between A and B]
Client -> Node B (Read)
// Node B serves the last known value it had: {"key": "value1"}
// The client receives a stale read.
Later, when the partition heals:
[Network Partition Healed]
// Node A has {"key": "value2"}
// Node B has {"key": "value1"}
// A reconciliation process (e.g., last-write-wins, conflict resolution)
// will eventually make both nodes consistent, but for a period, they were not.
Scenario 3: Prioritizing Consistency and Availability (CA)
This scenario is only possible if there are no network partitions. If the system is truly partitioned, you must give up either C or A. A single-node system is CA, but that’s not a distributed system. In a distributed system, if you aim for CA, you’re essentially assuming partitions never happen, which is a dangerous assumption in real-world networks.
The System in Action: A Conceptual Example (Simplified)
Consider a distributed cache. When a client writes a value, the cache might try to write it to multiple replicas.
- Consistency ©: Every read receives the most recent write or an error.
- Availability (A): Every request receives a response, even if it’s not the most recent data.
- Partition Tolerance (P): The system continues to operate despite network partitions.
In a real system, like Amazon Dynamo or Apache Cassandra, you configure the trade-off at the client or per-operation level. For example, you might configure a "write quorum" (W) and a "read quorum" ®.
- If
W + R > N(where N is the number of replicas), you get strong consistency (CP). A write must be acknowledged by W nodes, and a read must query R nodes. IfW+R > N, at least one node involved in the read must have participated in the write, guaranteeing it sees the latest data. However, ifWorRnodes are unreachable due to a partition, the operation fails (unavailable). - If
W + R <= N, you get eventual consistency (AP). Reads might not see the latest write ifRnodes are available but don’t include the nodes that received the latest write. Writes might succeed even if fewer thanWnodes acknowledge them. The system remains available.
The Levers You Control:
The primary levers are your consistency level and your read/write quorum settings.
- Consistency Level (e.g.,
ONE,QUORUM,ALL): This is a direct dial for your C vs. A trade-off during a partition.ONEis highly available but eventually consistent.ALLis strongly consistent but highly susceptible to unavailability during partitions.QUORUM(meaningN/2 + 1nodes) is a common middle ground. - Read Repair: A background process where, after a read operation, the coordinator checks for inconsistencies among replicas and pushes newer data to older ones. This helps achieve eventual consistency faster.
- Hinted Handoff: If a node is temporarily unavailable (e.g., due to a partition), another node can store the write request and deliver it later when the unavailable node comes back online. This improves availability.
Most modern distributed databases (like Cassandra, ScyllaDB, DynamoDB) are designed to be AP systems. They assume partitions will happen and prioritize keeping the system operational. They achieve consistency through mechanisms like tunable consistency levels and background reconciliation processes. The idea is that in most applications, it’s better to serve some data (even if slightly stale) than to serve no data at all.
The most surprising aspect is how often systems claim to be CP when, in reality, they are AP systems that have been configured to act like CP under specific, often undesirable, conditions. The "P" (Partition Tolerance) is the most fundamental property of any distributed system operating on an unreliable network; it’s not an option to be chosen. Therefore, the real choice is always between C and A when a partition occurs.
The next thing you’ll grapple with is how to actually implement conflict resolution when you’ve chosen AP and different clients have written conflicting data to different nodes during a partition.