Partitioning: Hash, Range, Hot Keys, and Rebalancing
Partitioning splits data and load across machines. The shard key is a performance contract: it decides locality, balance, and which queries become scatter-gather.
Partitioning is easy if every key is equally popular and every query touches one key. Real systems are hard because data and attention are skewed.
Why partition?
A single machine has finite disk, memory, CPU, and network. Partitioning splits a dataset into subsets so different machines can store and serve different parts. Replication copies data for availability; partitioning divides data for capacity and load.
The goal is even distribution of both bytes and requests. Those are different. A shard may hold average bytes but receive celebrity-level traffic. Another may be large but cold. A good partitioning strategy considers storage volume, query volume, and operation shape.
Hash vs range partitioning
Hash partitioning applies a hash function to the key and assigns hash ranges to partitions. It spreads keys evenly when the hash is good. It is excellent for point lookups by key and poor for range queries because adjacent keys are intentionally scattered.
Range partitioning assigns contiguous key ranges to partitions. It preserves order, so range scans are efficient. But sorted order can create hot spots. If keys are timestamps, the newest partition receives most writes. If keys are sequential ids, the tail shard burns.
Compound keys can combine ideas: partition by tenant hash, sort by timestamp within tenant. That makes "latest events for tenant X" local while spreading tenants across shards.
Secondary indexes and rebalancing
Partitioning primary data is only the first layer. Secondary indexes complicate everything. A local secondary index lives on the same shard as the primary data, making writes local but queries across secondary fields scatter across shards. A global secondary index partitions by the indexed term, making reads direct but writes update a second distributed structure.
Rebalancing moves partitions when nodes are added or removed. Moving one key at a time is expensive; moving fixed partitions or virtual nodes is more manageable. But rebalancing itself consumes network and disk bandwidth, so automatic rebalancing must be rate-limited and observable.
Trade-offs
| Choice | Buys | Costs |
|---|---|---|
| Hash partitioning | Even key distribution and simple point lookups | Range queries scatter; hot keys remain hot |
| Range partitioning | Efficient range scans and sorted access | Sequential writes and skew can create hot ranges |
| Local secondary index | Write path stays local | Reads by secondary field scatter-gather |
| Global secondary index | Secondary reads are targeted | Writes fan out and index consistency lags |
You should be able to name the contract this mechanism offers, the workload or invariant that justifies it, and the bill it sends somewhere else: read latency, write latency, storage, availability, freshness, or operational complexity.
A bad shard key turns scale-out into a single overloaded machine with extra steps. The system has many nodes, but the hot key, hot tenant, or hot range decides throughput.
Design prompts
- Shard a multi-tenant feature store. What is the shard key and what query becomes hard?
- Why does hashing not solve celebrity hot keys?
- When should a secondary index be global instead of local?