Logo

[[Sharding] ] - Distributing data across multiple machines Pasted image 20240725193803.png

It also uses Consistent Hashing fpr flexibility and controlling load across multiple instances, similar to Horizontal Scaling (ref: Scaling)

Sharding can be on the basis of the context, example:

  1. The location
  2. Type of data

Considerations:

  1. Database Indexes can be used for faster quering across multiple instances
  2. Consistency Models should be taken into account. Why? when data is Distributed, main problem is consistency

Sharding Strategies

1. Hash based

Pasted image 20241030171314.png

  • Example: Hash(user_id) % 2 determines the shard number for a user, distributing users evenly across 2 shards.

2. Ranged Sharding

Pasted image 20241030171345.png

  • Example:
    • Shard 1 contains records with IDs from 1 to 10000,
    • Shard 2 contains records with IDs from 10001 to 20000, and so on.

3. Directory Based

Pasted image 20241030171609.png

  • Maintains a lookup table that directly maps specific keys to specific shards.
  • Example: lookup table as cache, and shards can be storage buckets (S3)

© 2025 All rights reservedBuilt with DataHub Cloud

Built with LogoDataHub Cloud