Volume

characterised by the three Vs…

1. Volume

\rightarrow Size of generated and stored data (e.g. GB, TB)

2. Variety

\rightarrow type and nature of data in in the original unstructured (== to be defined) raw forms. \rightarrow types: seq + time data, data streams, engineering design data, multimedia data, graph and network data. \rightarrow new challenges: how to store + analyse data.

Characteristic data

Pasted image 20240702082248.png

NoSQL database:

\rightarrow stores data as objects with attributes \rightarrow using object notation language (JSON) \rightarrow VS rational table-based model advantage: set of attributes of each object is encapsulated within the object \rightarrow individual objects can have diff attributes Pasted image 20240702082430.png

Data lakes

  • a system/repository that con store raw data, as well as processed/transformed data
    • usually as files or ==object blobs== - _collection of binary data stored as a single entr
  • can include:
    • structured data from rational databases
    • semi-structured data, e.g. CSV, logs, XML, JSON
    • unstructured data, e.g. emails, documents
    • binary data, e.g. images, video, audio

3. Velocity

\to speed at which data is generated + processed (e.g. real-time)

stationarynon -stationary
Fixed data setData set not static
All data available for analysisData instances arrive continuously
over time new dataRequires real-time model analysis
\therefore analysis redone/model redeveloped/model adaption

© 2024 All rights reserved

Built with DataHub LogoDataHub Cloud