DMS (Data Management System)
DMS (Data Management System)
This document is an introduction to the technical design of Data Management Systems (DMS). This also covers Data Portals since Data Portals are one major solution one can build with a data management system.
Domain Model
- Project: a data project. It has has a single dataset in the same way GitHub or Gitlab "project" has a single repo. Traditionally in, say CKAN, this has been implicit and identified with the dataset. There are, however, important differences: a project can include a dataset but also other related functionality such as issues, workflows etc.
- Dataset: a set of data, usually zero or more resources.
- Resource (or File): a single data object.
Revisioning
- Revision
- Tag
- (Branch)
Presentation
- View
- Showcase
- Data API
Identity and Permissions
- Account
- Profile
- Permission
Data Factory
- Task
- DAG (Pipeline)
- Run (Job)
GraphQL version
type Project {
id: ID!
description: String
readme: String
dataset: Dataset
views: [View]
issues: [Issue]
actions: [Action]
}
type Dataset {
# data package descriptor structure
id: ID
name: String
...
resources: [Resource]
}
type Resource {
# follows Frictionless Resource
path: ...
id: ...
name: ...
schema: Schema
}
# Table Schema usually ...
type Schema {
}
# dataset view e.g. table, graph, map etc
type View {
id: ID!
}
Actions / Flows [component]
- View Dataset: [Showcase page] a page displaying the dataset (or a resource)
- View a Revision / Tag / Branch:
- Add / Upload: …
- Tag
Components
- Meta
StoreService: stores dataset metadata (and revisions) - HubStore: stores all the users, organizations and their connections to the datasets.
- SearchStore + Service: search index and API
- BlobStore: stores blobs (for files)