Data Projects Database

Data Projects Database

This is a list of interesting data projects that might be of interest to the Datopian community.

Each project is listed within the category that's closer to the Datahub Data Management System (DMS) but might have interesting ideas on other categories as well.

Data Factory

  • Kamu. A command-line tool for managing, transforming, and collaborating on structured data.
  • Bacalhau. A platform for fast, cost efficient, and secure computation by running jobs where the data is generated and stored.

Package Management

  • Open Data Fabric. Open protocol specification for decentralized exchange and transformation of semi-structured data, that aims to holistically address many shortcomings of the modern data management systems and workflows.
    • The protocol takes care of some interesting aspect of data like reproducibility, complete historical account (all history is preserved), veriability (data is immutable), and provenance (data is linked to its source).
    • It also has some strong opinions on the nature of data and transformations. The entire specification is worth reading.
    • Dataset and transformations are defined in YAML files.
  • Qri. Was a project to help with dataset syncing, versioning, storing and collaboration. Sadly, it came to an end early in 2022.
  • Datalad. Distributed data management system that keeps track of your data, creates structure, ensures reproducibility, supports collaboration, and integrates with widely used data infrastructure.
    • Uses Git Annex (distributed binary object tracking layer on top of git) to provide a decentralized dataset management system.
    • Can be extended to IPFS.
  • Quilt.
    • Works on top of S3.
  • Oxen.
  • LakeFS. More like Git for Data.
  • DVC.
  • XVC.
  • Xetdata.
  • Dud.
  • Deep Lake.
  • Dim.
  • Juan Benet's data.
  • Colah's data.
  • Dolt.
  • Ocean Protocol Market.

Frontend

  • Evidence.dev.
  • Malloy Notebooks.
    • Install recommended extension, malloydata.malloy-vscode, and open the notebook. Everything runs on the browser.

Visualizations and Dashboards

Data APIs

© 2024 All rights reserved

Built with DataHub LogoDataHub Cloud