Sharding
Sharding is horizontal partitioning of a database: splitting a large dataset across multiple database servers, each holding a subset of the data. Each partition is called a shard. Together, the shards hold the complete dataset. Sharding allows a system to scale storage and write throughput beyond what a single server can handle.
What Is Sharding
Vertical scaling (adding more CPU, RAM, or faster disks to one server) has a ceiling. At some point, a single machine cannot handle the data volume or write rate a growing system demands. Sharding breaks the data horizontally: instead of one table with 10 billion rows on one server, you have ten servers each holding 1 billion rows of the same table.
Unlike replication (which copies the same data to multiple nodes for read scaling and fault tolerance), sharding distributes different data to different nodes. Each row lives on exactly one shard. This distributes both storage and write load proportionally.
In many databases, "partitioning" refers to dividing a table within a single server (PostgreSQL table partitioning, MySQL partitioning). "Sharding" typically means partitioning across multiple servers. The logical concepts are the same; the operational difference is that sharding requires routing logic to direct queries to the correct server.
The Shard Key
The shard key is the column (or combination of columns) used to determine which shard a row belongs to. Choosing the right shard key is the most critical sharding decision — it affects data distribution, query efficiency, and operational complexity for the lifetime of the system.
A good shard key:
- Distributes data evenly across shards. Uneven distribution creates hot spots.
- Aligns with access patterns. Most queries should hit a single shard or a small number of shards. A shard key that requires scatter-gather queries on every lookup defeats the purpose of sharding.
- Has high cardinality. A boolean column as a shard key means you can only ever have two shards. A UUID has billions of possible values.
- Does not create temporal hot spots. A timestamp shard key routes all current writes to the latest shard — a sequential hot spot.
Common shard key choices: user ID (for user-centric data), tenant ID (for SaaS), geographic region, or a combination. The right choice is application-specific.
Range Sharding
In range sharding, the shard key space is divided into contiguous ranges. Each shard owns one range.
Shard 1: user_id 1 – 10,000,000
Shard 2: user_id 10,000,001 – 20,000,000
Shard 3: user_id 20,000,001 – 30,000,000
Advantages: Range queries (e.g., "all users with IDs between 5M and 8M") hit a single shard. Adding new shards for new ranges is straightforward. Data locality: users in the same range co-locate, which is useful if related entities reference each other.
Disadvantages: Sequential inserts (auto-increment IDs, timestamps) always go to the latest range shard, creating a write hot spot. As data grows unevenly in ranges, some shards become much larger than others, requiring periodic rebalancing.
Hash Sharding
In hash sharding, the shard key is passed through a hash function, and the result modulo the number of shards determines the target shard.
shard = hash(user_id) % num_shards
Advantages: Hash functions distribute data uniformly regardless of key distribution patterns. Sequential keys (IDs, timestamps) scatter evenly. No single shard receives all new writes.
Disadvantages: Range queries across shard keys require querying all shards (scatter-gather). Adding or removing shards changes the modulus, which reroutes most existing data to different shards — a resharding problem. Consistent hashing (see the Consistent Hashing guide) mitigates this by only remapping a fraction of keys when nodes are added or removed.
Directory-Based Sharding
A lookup service (the directory) maintains a mapping from shard key values to the shard that holds them. When routing a query, the application first queries the directory to find the correct shard, then queries that shard.
Advantages: Maximum flexibility — the directory can implement any mapping logic. Moving data between shards only requires updating the directory entry, not rehashing. Supports heterogeneous shard sizes.
Disadvantages: The directory is a single point of failure and a performance bottleneck (every query requires a directory lookup). Caching the directory at the application layer reduces latency but adds consistency complexity. The directory itself must be highly available and consistent.
Hot Spots
A hot spot occurs when traffic concentrates on one shard disproportionately. Common causes:
- Monotonically increasing keys (auto-increment IDs, timestamps): all new writes go to the last shard.
- Celebrity or viral data: a high-profile user's data or a trending item receives far more reads than average. Even with a well-distributed hash, the shard holding Beyoncé's account or a viral post gets hammered.
- Seasonal patterns: holiday traffic surges on specific data (products, events).
Mitigation strategies:
- Add a random suffix to hot keys: instead of
hash(user_id), usehash(user_id + random(1..10)), spreading the load across 10 virtual shards. Reads must aggregate from all suffix variants. - Cache hot data upstream. A Redis layer absorbs read traffic for popular items before it reaches the shard.
- Sub-shard the hot shard. Split the overloaded shard into multiple smaller shards by moving a range or re-hashing its data.
Cross-Shard Queries
Once data is spread across shards, queries that span multiple shards become expensive. A query without the shard key in its WHERE clause must be sent to all shards (scatter-gather) and the results merged at the application or middleware layer.
Problems with cross-shard queries:
- Latency multiplies. The response time equals the slowest shard, not the average.
- JOINs across shards are either impossible in the database layer or require pulling large datasets to the application to join in memory.
- Aggregations (COUNT, SUM, AVG) must be computed per-shard and merged — which is correct for COUNT and SUM but requires care for AVG, MAX, DISTINCT, etc.
- Global ordering and pagination (ORDER BY with LIMIT/OFFSET) require fetching the top N rows from every shard and merging — you cannot simply apply LIMIT per shard and take the union.
The solution is to design your access patterns around the shard key. If a query is always "get all orders for user X," and user_id is the shard key, every order query hits a single shard. If the query is "get all orders in the last 24 hours across all users," it's a scatter-gather — consider keeping a secondary index service or a separate analytics replica for such queries.
Resharding
Resharding — moving data between shards, usually because existing shards are too large or too hot — is one of the most operationally painful aspects of sharding. With modulo-N hashing, adding a single shard changes N, which remaps approximately (N-1)/N of all keys — nearly all data needs to move.
Approaches to reduce resharding pain:
- Consistent hashing limits remapping to
1/Nof keys when a node is added. See the Consistent Hashing guide. - Over-provision virtual shards. Instead of mapping shard keys directly to physical servers, map them to many virtual shards (e.g., 1024), then assign virtual shards to physical servers. When adding a physical server, move some virtual shards to it — no rehashing, just a directory update.
- Double-write during migration. Write to both the old and new shard locations simultaneously. Once the new shard is caught up and verified, cut over reads and stop writing to the old location.
Design Considerations
- Delay sharding as long as possible. Sharding adds complexity — routing logic, cross-shard query limitations, operational burden. Exhaust vertical scaling, read replicas, caching, and table partitioning first. Start sharding only when you have a concrete bottleneck a single server cannot solve.
- Choose your shard key once. Changing the shard key after data is sharded requires migrating all data. Get this decision right early by modeling your most frequent and most critical queries.
- Application-level vs database-native sharding. Some databases (MongoDB, Cassandra, CockroachDB, Vitess for MySQL) handle sharding natively. Others require the application or a middleware layer to implement routing. Native sharding is more transparent but less flexible; application-level sharding is more complex but gives you full control.
- Cross-shard transactions are expensive. Avoid them in hot paths. Design data models so that related data that must be updated atomically lives on the same shard (same user, same tenant).
- Monitoring per-shard metrics. Alert on shard imbalance (size disparity, QPS disparity) early. By the time a shard is visibly overloaded, it's already affecting user experience.