Standup / Sprint planning 2023-03-03

#todo/process

Present: Ola, Joao, Rufus, Khalil

Next steps

Agenda

Rufus comment re table feature analysis
- TODOs: a list (preferably, instead of table) of features that I/we think we want to have + screenshots of them "in action" for both Tanstack and AG Grid table
What do we do about building with api routes? Ans: leave for now whilst we refactor the pipeline.
direction of travel for next sprint

Direction of travel

Need a "focus" near-term use case (of our own if possible) so that we can choose week to week work
- We have clear high level product vision
- However, still lots of directions we could go in right now (e.g. publish ui vs showcase, showcase from github source vs local etc)
- Need something to guide those choices
Use case could be "data rich wiki/garden", specifically posting up the kind of stuff in https://github.com/datasets/awesome-data/issues e.g. here's a dataset about co2 concentrations i found
The simplest way to start on that is to use the digital garden/wiki already in datahub-next in the content folder (and switch later to rendering remote)
This would imply near-term:
- Switching our render system from contentlayer and flowershow v1.2 (or even 0.9 which is what we actually using) to our new MDX+D pipeline
- Rebuilding our pipeline with all our learnings from Flowershow so far with these differences
- Render pipeline that works with the cloud and specifically remote content
- Data features e.g. table, graph etc
- New content layer (a way to load)
- Aside: You could see this as a convergence of Flowershow "Next" and DataHub "Next" 😄

tied to mdx-bundler (that we don't want and is limiting)
ties too much stuff together in a non-clean way (it's more strictly than a contentlayer) e.g. has render, content loading, content schema
local-only
so-so query layer

Terminology

Intuition there is something in the scratch pad / wiki idea.

#dontstopflow remember writing stuff about this back in the day for datahub v2. dig out the materials on hackmd and post 💤

what could we blog about day in, day out that would build our muscle.

What could we use this anger or for day to day stuff for?

Brainstorm

For product comparisons e.g. what is best library for X?
- best javascript table library
- best cloud storage service
publishing datasets (especially paid for)
"intersting data" (stuff people search for on statista) e.g. iphone sales, gdp, etc
Ecosystem map of our competitors / complementors e.g. quandl etc
comparotron?? one interesting comparison a day.
🔥 wiki/garden for data stuff ie. i want to create pages about datasets, data questions, data stories and weave them together
- Examples:
  - climate datasets: topical page, pages for specific datasets (start out just in the page?), questions etc.
  - cloud storage pricing / providers etc
  - catalog of interesting datasets
  - displaying items from datasets/awesome
- Aside: this is sort of meta item that could incorporate many of the others
Showcase the github.com/datasets
- may want to rework one into the form we would find most convenient.
- at minimum it is README + csv

A chart a day (keeps the doctor away)

A dataset a day (or week)

What's the minimum?
- url (to source)
- 3 sentence description
- image/screengrab
- data table?
bonus: graph etc
bonus: ultra-small datasets we could almost enter by hand

Do I bite the bullet and start on the domain model or do we keep pushing the direct proxy approach?
- Guess it somewhat comes back to job stories …
The term showcase was quite oriented to a dataset or a project where you are summarizing something. Less appropriate to simple rendering of a README.
- in any case README is not enough for project home page. (but why do we care about the extra stuff like project title, or the other stuff?)
- the making prominent of the README was a genius move of github and others.

Options

Proper domain model Move to proper Project object and run off that
- Why / Why not? not sure this does much for us atm
Direct Proxy is we don't have anything in our database and directly pull content from github and render on the fly
Switch the overall render system off contentlayer 🔥
- Why? that way we can have pages directly in repo that render data rich documents and starting implementing the wiki stuff

Options informed by a focal need

Moved to datahub-next/notes/jounal/2023-03-05

What's the minimal API for "MarkdownDB"

TODO ...

This is all great and what is immediate next steps?

Plan

Moved into meeting notes above.