Data Projects
Data Projects
Data Workflows
Package Managers
- Qri. An evolution of the classical open portals that added Decentralized Protocols (IPFS) and computing on top of the data. Sadly, it came to an end early in 2022.
- Datalad. Extended to IPFS
- Is a great tool and uses Git Annex (distributed binary object tracking layer on top of git).
- Complicated to wrap your head around. Lots of different commands and concepts. On the other hand, it's very powerful and flexible. Git Annex is complex but powerful and flexible.
- Huggingface Datasets
- Quilt
- Forces both Python and S3
- Oxen
- Data is not accesible from other tools
- Docs are sparse
- Definitely more in the Git for Data space than Dataset Package Manager
- Frictionless Data
- Datopian Data CLI. Successor of DPM
- LakeFS. More like Git for Data
- Datasette
- Algovera Metahub
- DVC
- XVC
- ArtiVC
- Xetdata
- Dud
- Splitgraph
- Deep Lake
- Dim
- Hard to grok how to use it from the docs
- Quite small surface area. You can basically install datasets from URLs, create new ones, or apply some kind of GPT3 transformation on top of them
- Juan Benet's data
- Colah's data
- Dolt is another interesting project in the space with some awesome data structures. They also do data bounties!
Open Data Indexes
- Google Dataset Search
- Data Commons
- BigQuery Public Data
- Kaggle Datasets
- Datahub
- HuggingFace Datasets
- Data World
- Eurostat
- Statista
- Enigma
- DoltHub
- Socrata
- Nasdaq
- Zenodo
- Splitgraph
- Awesome Public Datasets
- Data Packaged Core Datasets
- Internet Archive Dataset Collection
- AWS Open Data Registry
- Datamarket
- Open Data Stack Exchange
- IPFS Datasets
- Datasets Subreddit. Open Data Subreddit
- Academic Torrents Datasets
- Open Data Inception
- Victoriano's Data Sources
- Data is Plural
- Open Sustainable Technology
- Public APIs
- Real Time Datasets
- Environmental Data Initiative
- Data One
- The Linked Open Data Cloud
- Organisation for Economic Co-operation and Development
Open Datasets
- Wikipedia
- Github
- HackerNews
- Blockchain
- Our World In Data
- Fivethirtyeight
- BuzzFeed News
- ProPublica
- World Bank
- Ecosyste.ms
- Deps.dev
- Twitter Community Notes
Open Data Organizations
- Datahub
- Frictionless
- Open Data Services
- Catalyst Cooperative
- Carbon Plan
- Data is Plural
- Data Liberation Project
- Opendatasoft
- Open Source Observer
- Source.coop
- Our World in Data
Publishing Platforms
Visualizations and Dashboards
- Rath
- Hex.tech
- Perspective
- Rill Developer
- Datastation
- Excalichart
- Chartpilot
- Datawrapper
- Vega
- D3
- Plotly
- Bokeh
- Matplotlib
- Seaborn
- Altair
- Holoviews
- Panel