Data Platform
Data Platform
A data platform that works on open data should reuse similar tools and techniques as the ones we use in our companies. They have to be robust, scalable, and easy to use. This document outlines the design of a minimal data platform that can be used to load and transform data.
Introduction
The data platform has multiple components, including data loading, transformation, storage and publishing. The platform should be designed to be modular, so that each component can be replaced with a different implementation if needed.
- Data loading is the process of importing structured data (e.g. tables) into a data management system, preserving the data structure.
- Using standard ETL frameworks and patterns
- Separating loading service from core application
- Leveraging metadata standards (e.g. Frictionless Data specs)
- Providing APIs for programmatic access
- Data transformation is the process of converting data from one format to another, cleaning, and enriching it.
- Data storage is the process of storing data in a structured format, making it easy to query and analyze.
- Data publishing is the process of making data available to users, either through APIs or data portals.