Product Commentary

Reflections and Left-overs from main product analysis.

Product Vision History

A consolidated list of the various iterations of the product vision we have had.

DataHub Pages - Jan/Feb 2022
DataHub v3 (Feb 2021) https://tech.datopian.com/datahub/v3/ - in final version it was close to DataHub Pages (but without content + data emphasis)
- PLAN: …
DataHub v3 (alpha) (June 2020 to Feb 2021) - still had git+storage model and emphasis on versioning - for an example see 2021-02-03
DataHub Git-Based (2020) https://github.com/datopian/datahub-git-based - A design for a next generation, fully-git(hub) + cloud based DataHub (2020) Git-based datahub vision.

A next generation, fully-git(hub) based DataHub.

Git-based is definite. Initial KISS approach means going fully GitHub based so GitHub provides MetaStore and HubStore.

Product Pitches Various

Hypothesis - 17 Feb 2022

A product that takes data (CSV, XLS etc) and (markdown) content in git(hub) and provides an elegant, customizable presentation on the web (including tables and graphs) via self-service or a cloud service is attractive to a growing set of users.

Option: DataHub Pages Product Vision and Plan (GDoc)

A fast, simple service for data-savvy types to publish and share data in a usable way [it’s already easy to share via s3, gdrive, dropbox, even github etc - but these aren’t data-oriented].

Easy, fast data publishing (deployment)
Push / paste / upload a data file and get a published “portal”
Stripped back data portal platform focused on frontend and (at the start) a single data file / dataset

v2 [tweaked]: A fast, simple service (self-service & cloud) for data-savvy users to publish and share data / data-driven content, starting from data on github

Option B

Publish your data files / datasets (showcase/present)
Share your data (in a useable, data-oriented) way.

Option: DataHub v3 doc

Make it stupidly easy, fast and reliable to share your data in a useable^a way^b.

^a It is already easy to “share” data: just use dropbox, google drive, s3, github etc. However, it’s not so easy to share it in a way that’s usable e.g. with descriptions for the columns, data that’s viewable and searchable (not just raw), with clearly associated docs, with an API etc.

^b Not only with others but with yourself. This may sound a bit odd: don’t you already have the data? What we mean is, for example, going from a raw CSV to a browseable table (share it with your “eyes”) or converting it to an API so that you can use it in a data driven app or analysis (sharing from one tool to another).

Option A

Core offer: elegantly present your data - fast, easy and repeatable

Self-service: you add DataHub Pages (portal.js template) to your repo and self-publish on your deployment platform of choice (e.g. vercel, netlify etc)

Cloud: DataHub Pages in our cloud, you just integrate your GitHub repo in a couple of clicks … and have your data turned into a beautiful site that updates on every push**

#todo

Is it "presenting" / "publishing" / "sharing"?
Is it about "data" or "data + content" (what do we call that)?

Notes

Leo Email - 2022-01-30

I've been going through portal js, datahub and the product documents we have.

Basically I've been trying to see how to make this sentence I said last year real "Make sharing new datasets stupidly easy" after a few trials and trying to see how things work the current state of the things we have is really far from it. This is something we have already discussed some time ago.

My first goal is to do the following:

Have a CSV of the data we want to publish and [optional] have a markdown file with the text description then:
run a single command/script, it creates a static webpage
be able to git push into github pages and it just works

To get to publish a dataset with what we currently have we need to:

study and understand what Frictionless data is and how to create it (the end user shouldn't, they should be able to start without knowing anything about it)
understand what portal.js is, how to install it and that there are different ways of publishing with it, including a single dataset option
understand that portal.js is built on next.js which makes us lose focus, and then this runs over react.js which has its own complexity by itself.
Understand how to deploy a react application in github pages

Just installing the npm dependencies with a fiber connection takes about 1 minute which, from a user point of view can already be a problem, but the most frustrating thing is all the knowledge and work needed to just be able to set up things to start to build my data page.

This is quite a challenge for somebody that just wants to publish some data, with some text and maybe some graphics.

So my goal for the next days is to understand how to make this much simpler and from there build the scripts (this might or might not use all of portal.js in the first version).

Some notes:

the user does NOT need to know frictionless, nextjs, react, portaljs
the user should be able to run a simple script (the first one should be text based) that allows for a creation of a simple static webpage that can be pushed to github pages (maybe even create a script that does it for them)
the user should be able to install the tool with a simple shell script and/or pip (python). Note that I choose python because it is one of the most used languages in the data and scientific domains, which means that most people would be able to run a python script.

For this some individual goals are:

automatic generation of the frictionless json file
templated UI (as for datahub)
static website generation
some default graphs that can be chosen (possible generated by Vega Lite)

The power users then should then be able to take advantage of all the setup (frictionless, portal.js and so on) to make more complex and personalized modifications.