Volume
characterised by the three Vs…
1. Volume
Size of generated and stored data (e.g. GB, TB)
2. Variety
type and nature of data in in the original unstructured ( to be defined) raw forms. types: seq + time data, data streams, engineering design data, multimedia data, graph and network data. new challenges: how to store + analyse data.
Characteristic data
NoSQL database:
stores data as objects with attributes using object notation language (JSON) VS rational table-based model advantage: set of attributes of each object is encapsulated within the object individual objects can have diff attributes
Data lakes
- a system/repository that con store raw data, as well as processed/transformed data
- usually as files or ==object blobs== - _collection of binary data stored as a single entr
- can include:
- structured data from rational databases
- semi-structured data, e.g. CSV, logs, XML, JSON
- unstructured data, e.g. emails, documents
- binary data, e.g. images, video, audio
3. Velocity
speed at which data is generated + processed (e.g. real-time)
stationary | non -stationary |
---|---|
Fixed data set | Data set not static |
All data available for analysis | Data instances arrive continuously |
over time new data | Requires real-time model analysis |
analysis redone/model redeveloped | /model adaption |