DataHub v3
NB: this was the plan and vision for DataHub v3 as of early 2021.
Introduction
Please read: product overview and vision at https://tech.datopian.com/datahub/v3/
Our focus is on DataHub but the work breaks down into two sub-products:
- DataHub v3: a platform at DataHub.io for sharing (deploying) datasets
- Portal.JS v2: a framework for building (frontend) portals to data
What's the relationship? DataHub (and DataHub users) will use Portal.JS to create (at least part of) the portal that gets deployed there.
Key choices
- Single-minded focus on single dataset case to start with (so anything catalog-like comes later)1
- (?) Focus on build and deploy and hence
data portal
is a convenience not the essence (the equivalent of dev server)2
Thus, from a very high level development looks something like this:
NB: the present, share loop will involve constant iteration i.e. it is not as if one waits for "Present" to be perfect before moving to deploy. Rather you have a feature in Present and you can move to Deploy, then back to Present etc.
Purpose and Principles
Create and launch DataHub v3 (data
) and Portal.js v2 framework. Specifically:
- DataHub:
data deploy
locally or on github results in a portal at xxx.yyy.datahub.io including a Data API- Can be customized
- Portal.JS: excellently documented frontend framework for creating data portals to single dataset or a catalog
- Tutorial
- Examples: single data
- Catalog support against CKAN and Github
v1:
- [DataHub]: We are using
data portal
ourselves locally to preview datasets. - [Portal.JS] has great documentation either in github or an elegant public website at e.g. portaljs.datopian.com and is announced on reddit/hacker news etc.
v2:
- DataHub: We are using
data deploy
ourselves and others are using it. - Portal.JS v2 is released with exploration support via e.g. sqlite.
v3
- DataHub: Data APIs work.
Outcome Visioning
Brainstorm and Organize
DataHub - Tasks
- Website: Start a simple single page site at next.datahub.io @rufuspollock ✅2022-03-10 this was done in early 2022. now mch updated
- CLI preliminaries: Take over https://github.com/datopian/data-cli (?) @rising ✅2023-02-25 ❌ in the end just started our own.
- Some of this might want to move out to frictionless-js? TODO: ask Anu/Rising
- Do we need to worry about existing users for DataHub v2? ANS: let's just branch main part of datahub-v2 branch and go ahead.
- Local portal:
data portal
command to launch a "portal" for a single local dataset Analysis done ✅2023-02-25 ❌ we aren't going this route for now. sort out have something like this in flowershow thought - Portal online:
data deploy
results in a basic portal with showcase at e.g.dataset.username.datahub.io
Analysis 80% done ✅2023-02-25 ❌ we aren't doing the CLI route for now- Login: login API and
data login
command (can we just reuse old code from DataHub v2?)
- Login: login API and
- Dashboard: have a user dashboard 🚚 to git-dms
- Dashboard v0.1: basic dashboard with user profile information and list of portals and their deployments
- Dashboard v0.2: usage information (space used etc)
- [deploy] API support:
data deploy --api
results in a portal with an API. Analysis 40% done ✅2023-02-25 ❌ not using api route. new analysis for datahub-v3 in progress in ../notes/git-dms-data-api - Github integration so that every push result in a deploy to DataHub (Look at Vercel here as they're doing great: ../notes/vercel-git-integration-lessons-datahub-v3 ✅2023-02-25 ❌ DUPLICATE. This is now part of git-dms stuff
- Website plus
- Do we port over the docs / awesome stuff?
- What about marketplace? e.g. marketplace.datahub.io or datahub.io/marketplace 🚚 moved to marketplace
- Payment and Billing
FUTURE:
#done/process 🚚 to git-dms
- [catalog] Creating and deploying a catalog (multiple datasets)
- [workflows] More value add: e.g. on every deploy do data validation, data summarization
- [workflow] custom work flows
- [api] Metrics and user rate limiting (+ pay per use)
- [api] Custom APIs (see design thinking done on this for energinet)
- Catalog on DataHub: metadata of all deployed datasets are available for search in the main portal (Datahub.io/search)
PortalJS
Moved to https://github.com/datopian/portal.js/blob/main/DESIGN.md
Plan (MVP)
Purpose and Principles
- Launch new open source framework portaljs "The Frontend Framework for Data" and attract alpha users. Key results:
- Portal.js launched with support for graphs, data preview (table)
- Decent install experience
npm install
(is ok for now) - Website for the tool/framework (e.g. portaljs.org portaljs.datopian.com?)
- Announce and promoted e.g. Reddit, Hacker news, Present at Frictionless community meetup
- Has 200 stars within first 2 months of launch
- 1000 downloads and installs
- Launch “DataHub” for previewing and deploying “portals”. Key results:
cd my-frictionless-dataset && data portal
works- Tool for deployments/building (just using SSG?) e.g.
cd frictionless-dataset && data deploy
works - We can build github.com/datasets this way (building on each update)
Outcome Visioning
We are building finance-vix
Portal.JS has great documentation either in github or an elegant public website at e.g. portaljs.datopian.com and is announced on reddit/hacker news etc.
Next Steps
Sprint 1
- Take stuff over
- Take over recline
- Take over data-cli repo (? is this a priority right now. Ans: No) ✅2023-02-25 ❌ not needed
- next.datahub.io front page done In progress ✅2023-02-25 this got done
- data deploy from github for finance-vix using SSG ✅2023-02-25 ❌ never got there and not now
Design
This is design for lower level epics.
- [portal] for single local dataset ✅2023-02-25 ❌ deleted this analysis as nothing really added to presentation on https://datahub.io/docs/dms/datahub/v3/
- [deploy] Basic showcase ✅2023-02-25 ❌ ditto
- [deploy] API for a dataset ✅2023-02-25 ❌ 🚚 to ../notes/git-dms-data-api
Footnotes
-
QU: Do we focus on the single dataset model or do we expand to cover the full catalog?
- The simplicity of a single dataset is compelling …
- ANS: for now let's keep this ruthlessly focused on the basic use case. We can do more catalog type work in examples/catalog if we want that …
-
QU: is the CLI tool a key feature of a convenience …?
ANS: my sense is it is more like
next dev
i.e. for development and if it is fast and simply a good way to preview data great but it is not hte main purpose (if you wnat to explore your data fast there are lots of things you can do from the command line …) ↩