Databases

Consistent Hashing

● Advanced ⏱ 12 min read database

Consistent hashing is a technique for distributing data or load across a set of nodes in a way that minimizes disruption when nodes are added or removed. It is used in distributed caches, databases, and load balancers to solve the rehashing problem — the expensive remapping of all keys that occurs when the number of nodes changes.

The Problem with Modulo Hashing

The simplest way to assign a key to a node is modulo hashing:

node = hash(key) % N   // N = number of nodes

This works when N is fixed. But when a node is added (N becomes N+1) or removed (N becomes N-1), the modulus changes. Almost every key maps to a different node. A system with 10 nodes and 1 million cached items must move roughly 909,000 items (90.9%) when an eleventh node is added. This is a cache avalanche: after a node change, most cache lookups miss, routing all traffic to the origin, which can overload it.

The same problem affects sharded databases: changing N requires migrating nearly all data to different shards.

The Hash Ring

Consistent hashing solves this by mapping both keys and nodes onto a circular hash space — the ring. The ring spans hash values from 0 to 232−1 (for a 32-bit hash), wrapping around so the maximum value connects back to 0.

Each node is assigned a position on the ring by hashing its identifier (hostname, IP, or a generated name):

ring_position(node) = hash(node_id)

To find which node owns a key, hash the key to get its ring position, then walk clockwise around the ring until you find the first node. That node owns the key.

ring_position(key) = hash(key)
node = first node clockwise from ring_position(key)
The hash ring: nodes and keys are mapped to positions on a circle; each key is owned by the next clockwise node

Adding and Removing Nodes

Adding a node: The new node is placed at a position on the ring (hash of its ID). It takes ownership of keys in the arc between the previous node and its own position. Only keys in that arc need to move — from the node that previously owned them to the new node. All other keys are unaffected.

With N nodes uniformly distributed, adding one node moves approximately 1/(N+1) of the total keys. Adding an eleventh node to a ten-node ring moves ~9% of keys, not 90%.

Removing a node: The removed node's keys are transferred to the next clockwise node. Only keys owned by the removed node move. Keys owned by all other nodes stay where they are.

This is the core property of consistent hashing: when the set of nodes changes, only the minimum necessary keys are remapped. Specifically, when adding one node, only K/N keys move (where K is the total number of keys and N is the new total of nodes).

Virtual Nodes

With a small number of physical nodes, the ring positions may not be uniformly distributed — some nodes may own large arcs (and thus more keys) while others own tiny arcs. This imbalance gets worse the fewer nodes you have.

Virtual nodes (vnodes) solve this. Instead of placing each physical node at one position on the ring, each physical node is represented by many positions — typically 100 to 200. When a key falls in a vnode's arc, it's served by the physical node that owns that vnode.

// Assign 150 vnodes per physical node
for i in range(150):
    ring_position(node_A, i) = hash("node_A:" + i)

With 150 vnodes per node, a 10-node cluster has 1,500 ring positions, distributing the key space into 1,500 arcs. The law of large numbers ensures much better balance than 10 positions. Heterogeneous hardware can be accommodated by giving more powerful nodes more vnodes — they receive proportionally more keys.

When a node is added, its vnodes are inserted at 150 positions around the ring, each stealing a small slice from the previously owning node. The data migration is spread across many small transfers rather than one large one, which is operationally easier to manage.

Replication with Consistent Hashing

In distributed databases that use consistent hashing (Cassandra, Riak, DynamoDB's internal design), replication is implemented by storing copies on the next N nodes clockwise from the primary position.

// Replication factor = 3
primary replica = first clockwise node from hash(key)
second replica  = next clockwise node after primary
third replica   = next clockwise node after second

When a node fails, the keys it owned are still available on the next two replicas. When the node recovers (or a replacement joins), it re-acquires its position on the ring and re-syncs data from its neighbors — a process called hinted handoff or anti-entropy repair in Cassandra.

Real-World Use

Cassandra: Uses consistent hashing with vnodes to partition data across the cluster. The token ring is the core of Cassandra's data distribution. Each node owns ranges of the 64-bit token space.

Amazon DynamoDB: The original DynamoDB paper (2007) describes consistent hashing as the foundation of its partitioning. The managed DynamoDB today abstracts this behind automatic partition management, but consistent hashing concepts underpin it.

Memcached (with consistent hashing clients): Memcached itself has no built-in cluster management. Client libraries like libmemcached implement consistent hashing so that adding a cache node doesn't invalidate most cached items.

Nginx and HAProxy: Can use consistent hashing for upstream selection so that adding a backend server doesn't reroute most client sessions — important for stateful workloads or when upstream servers have their own caches.

Chord and Kademlia: Peer-to-peer protocols (Chord for structured overlays, Kademlia for DHT-based systems like BitTorrent) use consistent hashing-based routing to locate data in a decentralized network without a central directory.

💡
Jump Consistent Hash

Google's Jump Consistent Hash (2014) is an alternative to ring-based consistent hashing that requires no ring data structure. It computes bucket(key, num_buckets) in O(ln N) time using only arithmetic. It minimizes remapping and has excellent load balance, but it doesn't support arbitrary removal of nodes (only the last bucket can be removed). It's ideal for stateless systems where you control which nodes are active, like a consistent hash across a fleet of cache servers.

Design Considerations