DataHub v3

NB: this was the plan and vision for DataHub v3 as of early 2021.

Introduction

Please read: product overview and vision at https://tech.datopian.com/datahub/v3/

Our focus is on DataHub but the work breaks down into two sub-products:

  • DataHub v3: a platform at DataHub.io for sharing (deploying) datasets
  • Portal.JS v2: a framework for building (frontend) portals to data

What's the relationship? DataHub (and DataHub users) will use Portal.JS to create (at least part of) the portal that gets deployed there.

Key choices

  • Single-minded focus on single dataset case to start with (so anything catalog-like comes later)1
  • (?) Focus on build and deploy and hence data portal is a convenience not the essence (the equivalent of dev server)2

Thus, from a very high level development looks something like this:

NB: the present, share loop will involve constant iteration i.e. it is not as if one waits for "Present" to be perfect before moving to deploy. Rather you have a feature in Present and you can move to Deploy, then back to Present etc.

Purpose and Principles

Create and launch DataHub v3 (data) and Portal.js v2 framework. Specifically:

  • DataHub: data deploy locally or on github results in a portal at xxx.yyy.datahub.io including a Data API
    • Can be customized
  • Portal.JS: excellently documented frontend framework for creating data portals to single dataset or a catalog
    • Tutorial
    • Examples: single data
    • Catalog support against CKAN and Github

v1:

  • [DataHub]: We are using data portal ourselves locally to preview datasets.
  • [Portal.JS] has great documentation either in github or an elegant public website at e.g. portaljs.datopian.com and is announced on reddit/hacker news etc.

v2:

  • DataHub: We are using data deploy ourselves and others are using it.
  • Portal.JS v2 is released with exploration support via e.g. sqlite.

v3

  • DataHub: Data APIs work.

Outcome Visioning

Brainstorm and Organize

DataHub - Tasks

  • Website: Start a simple single page site at next.datahub.io @rufuspollock ✅2022-03-10 this was done in early 2022. now mch updated
  • CLI preliminaries: Take over https://github.com/datopian/data-cli (?) @rising ✅2023-02-25 ❌ in the end just started our own.
    • Some of this might want to move out to frictionless-js? TODO: ask Anu/Rising
    • Do we need to worry about existing users for DataHub v2? ANS: let's just branch main part of datahub-v2 branch and go ahead.
  • Local portal: data portal command to launch a "portal" for a single local dataset Analysis done ✅2023-02-25 ❌ we aren't going this route for now. sort out have something like this in flowershow thought
  • Portal online: data deploy results in a basic portal with showcase at e.g. dataset.username.datahub.io Analysis 80% done ✅2023-02-25 ❌ we aren't doing the CLI route for now
    • Login: login API and data login command (can we just reuse old code from DataHub v2?)
  • Dashboard: have a user dashboard 🚚 to git-dms
    • Dashboard v0.1: basic dashboard with user profile information and list of portals and their deployments
    • Dashboard v0.2: usage information (space used etc)
  • [deploy] API support: data deploy --api results in a portal with an API. Analysis 40% done ✅2023-02-25 ❌ not using api route. new analysis for datahub-v3 in progress in ../notes/git-dms-data-api
  • Github integration so that every push result in a deploy to DataHub (Look at Vercel here as they're doing great: ../notes/vercel-git-integration-lessons-datahub-v3 ✅2023-02-25 ❌ DUPLICATE. This is now part of git-dms stuff
  • Website plus
    • Do we port over the docs / awesome stuff?
    • What about marketplace? e.g. marketplace.datahub.io or datahub.io/marketplace 🚚 moved to marketplace
  • Payment and Billing

FUTURE:

#done/process 🚚 to git-dms

  • [catalog] Creating and deploying a catalog (multiple datasets)
  • [workflows] More value add: e.g. on every deploy do data validation, data summarization
  • [workflow] custom work flows
  • [api] Metrics and user rate limiting (+ pay per use)
  • [api] Custom APIs (see design thinking done on this for energinet)
  • Catalog on DataHub: metadata of all deployed datasets are available for search in the main portal (Datahub.io/search)

PortalJS

Moved to https://github.com/datopian/portal.js/blob/main/DESIGN.md

Plan (MVP)

Purpose and Principles

  • Launch new open source framework portaljs "The Frontend Framework for Data" and attract alpha users. Key results:
    • Portal.js launched with support for graphs, data preview (table)
    • Decent install experience npm install (is ok for now)
    • Website for the tool/framework (e.g. portaljs.org portaljs.datopian.com?)
    • Announce and promoted e.g. Reddit, Hacker news, Present at Frictionless community meetup
    • Has 200 stars within first 2 months of launch
    • 1000 downloads and installs
  • Launch “DataHub” for previewing and deploying “portals”. Key results:
    • cd my-frictionless-dataset && data portal works
    • Tool for deployments/building (just using SSG?) e.g. cd frictionless-dataset && data deploy works
    • We can build github.com/datasets this way (building on each update)

Outcome Visioning

We are building finance-vix

Portal.JS has great documentation either in github or an elegant public website at e.g. portaljs.datopian.com and is announced on reddit/hacker news etc.

Next Steps

Sprint 1

  • Take stuff over
    • Take over recline
    • Take over data-cli repo (? is this a priority right now. Ans: No) ✅2023-02-25 ❌ not needed
  • next.datahub.io front page done In progress ✅2023-02-25 this got done
  • data deploy from github for finance-vix using SSG ✅2023-02-25 ❌ never got there and not now


Design

This is design for lower level epics.

Footnotes

  1. QU: Do we focus on the single dataset model or do we expand to cover the full catalog?

    • The simplicity of a single dataset is compelling …
    • ANS: for now let's keep this ruthlessly focused on the basic use case. We can do more catalog type work in examples/catalog if we want that …
  2. QU: is the CLI tool a key feature of a convenience …?

    ANS: my sense is it is more like next dev i.e. for development and if it is fast and simply a good way to preview data great but it is not hte main purpose (if you wnat to explore your data fast there are lots of things you can do from the command line …)

Built with DataHub LogoDataHub Cloud