Logo

Types of Data Systems

Standard components where data are stored, for some time maybe in a short duration(caches, queue) or long durations(databases)

Types of Data systems can be:

• (Databases) - Long duration data store, for future retrieval • (Cache) - Result of an expensive operation, to speed up reads
• (Search indexes) - Allow users to search data by keyword or filter it in various ways • (Stream Processing) - Send a message to another process, to be handled asynchronously • (Batch Processing) - Periodically crunch a large amount of accumulated data

Example of different Data Systems in use: Screenshot from 2024-07-29 23-25-09.png

Non-Functional requirements of Data Systems

Reliability: The system should continue to work correctly even in the face of adversity

Scalability: As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth.

Maintainability: New developers should be able to easily understand and make changes.

Load Parameters

Load can be determined with parameters such as:

  • Requests per second
  • Ratio of reads to writes in a database,
  • Hit / Miss rate on a cache

A really good example of handling load: Twitter Post Broadcast - Load Handling

Describing Performance

Percentiles may be better than using mean(average) for determining performance.

Median, is aka 50th percentile, says 50% users have response time lesser and the other 50% more than this point

Pasted image 20240730004150.png

Algorithms that can calculate a good approximation of percentiles at minimal CPU and memory cost, such as forward decay , t-digest, or HdrHistogram

Note

An architecture that is appropriate for one level of load is unlikely to cope with 10
times that load

© 2025 All rights reservedBuilt with DataHub Cloud

Built with LogoDataHub Cloud