Types of Data Systems
Standard components where data are stored, for some time maybe in a short duration(caches, queue) or long durations(databases)
Types of Data systems can be:
• (Databases) - Long duration data store, for future retrieval
• (Cache) - Result of an expensive operation, to speed up reads
• (Search indexes) - Allow users to search data by keyword or filter it in various ways
• (Stream Processing) - Send a message to another process, to be handled asynchronously
• (Batch Processing) - Periodically crunch a large amount of accumulated data
Example of different Data Systems in use:
Non-Functional requirements of Data Systems
Reliability: The system should continue to work correctly even in the face of adversity
Scalability: As the system grows (in data volume, traffic volume, or complexity), there should be reasonable ways of dealing with that growth.
Maintainability: New developers should be able to easily understand and make changes.
Load Parameters
Load can be determined with parameters such as:
- Requests per second
- Ratio of reads to writes in a database,
- Hit / Miss rate on a cache
A really good example of handling load: Twitter Post Broadcast - Load Handling
Describing Performance
Percentiles may be better than using mean(average) for determining performance.
Median, is aka 50th percentile, says 50% users have response time lesser and the other 50% more than this point
Algorithms that can calculate a good approximation of percentiles at minimal CPU and memory cost, such as forward decay , t-digest, or HdrHistogram
NoteAn architecture that is appropriate for one level of load is unlikely to cope with 10
times that load