[[Sharding] ] - Distributing data across multiple machines
It also uses Consistent Hashing fpr flexibility and controlling load across multiple instances, similar to Horizontal Scaling (ref: Scaling)
Sharding can be on the basis of the context, example:
- The location
- Type of data
Considerations:
- Database Indexes can be used for faster quering across multiple instances
- Consistency Models should be taken into account. Why? when data is Distributed, main problem is consistency
Sharding Strategies
1. Hash based
- Example:
Hash(user_id) % 2
determines the shard number for a user, distributing users evenly across 2 shards.
2. Ranged Sharding
Example:
- Shard 1 contains records with IDs from 1 to 10000,
- Shard 2 contains records with IDs from 10001 to 20000, and so on.
3. Directory Based
- Maintains a lookup table that directly maps specific keys to specific shards.
Example
: lookup table as cache, and shards can be storage buckets (S3)