What is MVP? (What is Product?)
What is MVP? (What is Product?)
- Publish their a rich digital garden
- Publish a Frictionless dataset?
- Create a data scratchpad?
Motivation: Need a Direction of travel (2023-03-03)
- Need a "focus" near-term use case (of our own if possible) so that we can choose week to week work
- We have clear high level product vision
- However, still lots of directions we could go in right now (e.g. publish ui vs showcase, showcase from github source vs local etc)
- Need something to guide those choices
- Use case could be "data rich wiki/garden", specifically posting up the kind of stuff in https://github.com/datasets/awesome-data/issues e.g. here's a dataset about co2 concentrations i found
- The simplest way to start on that is to use the digital garden/wiki already in datahub-next in the
content
folder (and switch later to rendering remote) - This would imply near-term:
- Switching our render system from contentlayer and flowershow v1.2 (or even 0.9 which is what we actually using) to our new MDX+D pipeline
- Rebuilding our pipeline with all our learnings from Flowershow so far with these differences
- Render pipeline that works with the cloud and specifically remote content
- Data features e.g. table, graph etc
- New content layer (a way to load)
- Aside: You could see this as a convergence of Flowershow "Next" and DataHub "Next" 😄
Data Projects vs Data Scratchpads 2023-03-05
What you build is determined by what you want to do
– School of obvious proverbs
What are we building towards exactly? Here are two options, both of which fit our overall vision of making data sharing easier:
- "Data Project": as per the name this path is about having something like a data project equivalent to github repo or a gitlab project. It has a similar home page with README (perhaps enhanced with data stuff) and a list of data files and tabs for other key features e.g. visualizations, data APIs, workflows etc.
- "Data Scratchpad": Another route envisages something more like a data-enhanced wiki or digital garden. The home page would just be that … a home page probably rendered from markdown. The rest of the site would be rendered pages. It wouldn't have specific sections generating for it.
Instinct is to focus on the latter – even though original focus was on former. Why?
- Scratches our own "itch" e.g. re github.com/datasets/awesome-data
- Scratchpad functionality would be needed for Data Project (to an extent)
- Gut instinct re the product: scratchpad is simpler with something simpler and build up. What's the simplest thing: create a page and go. The simplest thing is a README. etc.
What you build and what you want to do are in a dance together. Tools shape what we can/want to do and vice versa.
– School of not so obvious proverbs
Data Cavnas (Fancy)
What would be an amazing experience? An experience i personally would love …?
- Home screen
- Sign in
- Straight onto a canvas where i can drop things especially data files or urls to sites and get previews e.g. i can add a data.csv and immediately get a preview, i can drop a url and get the screenshot preview of that site, can add an image
- data.csv is uploaded in the background for me
- can link things together
- can groups things inside of larger "boxes"
- can click on any object and start adding metadata
- can split out groups to their own separate canvas and keep that canvas embedded as a sub-canvas
- NB: a reduced version of this (much simpler) is that it is not a canvas but more a flowing page. Everything here is same except:
- No visual layout
- No linking things with arrows etc
- Grouping would have to be by sections
- Embedding of bigger canvas can be link outs or transclusions
- And … i can jump out of this as a power user and go into the backend which is a github repo + cloud (for assets) + api (for data viewing)
Who is this for?
- data researchers: Super charge your research
- data storytellers: ..
Data Stories
Like canvas/scratchpad but a bit more like publishing something, a bit more like an article.
"Medium for Data Rich Content" "Wordpress for data stories"
Data publishing made easy / done right
Another vision: Github repo with README and data.csv
Data API (and a preview)
Give me a dataset, I give you an API
- Login
- Prompts for a data file(s) to use: can link to a data file to storage e.g. s3, or to github, or just upload (need to login maybe somehow)
- Generating: show a sample of data and crunch away in the background
- Landing page: data explorer like interface with API explorer built in
- Probably go for SQL by default
- Graphql a cute bonus
- Stuff like metrics
Critiques
- Is it a product? Does it hit a sweet spot of value and not needing customization (or special deployment)
- For the power users who want this i suspect many want quite a bit of control / customization.
- Also issues of data control …
- We need a Data API for other stuff but is it a product in its own right
Cloud publishing for your (data-rich) markdown
Because you can publish a Jupyter notebook
Because it's a pain to publish …
- Give me your Obsidian vault I give you a nice website
- This is Flowershow Cloud
Issues
- Customization of theme
What's a competitive difference vs what is out there
A combination of …
- Data support
- Cloud publishing
- Elegance
Routes in:
- people already using PKM stuff a lot
Appendix: Evolution of our product ideas
Data(set) Publishing -> Data Projects -> Data Scratchpad
- Dataset publishing: have a set of data files
- focus is a catalog of datasets
- dataset is basically just a bunch of data files
- readme is an afterthought
- landing page is an afterthought
- Data Project: not just a dataset, a data project
- main page is central (think of github landing page)
- data issues
- data apis
- views
- etc
- Data Canvas / Scratch: simplest version of a data project is a README with linked data files
- Similar to experience people have in PKM like Obsidian
- Or, even, in more polished version like publishing e.g. medium, wordpress etc
Had subversion of these e.g.
- Data Project
- Git-based: Turn git(hub) into a DataHub (current DataHub Next)
- Pages?