Standup / Sprint planning 2023-03-03
Standup / Sprint planning 2023-03-03
#todo/process
Present: Ola, Joao, Rufus, Khalil
Next steps
- Ola
- finishing table analysis e.g. for each item (key points added to issue)
- Ola: reading up on sanity.io and making notes especially about API
- documents
- assets (i.e. images, data files)
- bonus: for a call name for our "content layer" app
- Joao:
- graphs finished
- analysing what would be needed to replace the render layer of "Flowershow"
- open issue about what is needed to get new render pipeline working (DataHub / Flowershow Next) i.e. without filesystem dependencies etc
Agenda
- Rufus comment re table feature analysis
- TODOs: a list (preferably, instead of table) of features that I/we think we want to have + screenshots of them "in action" for both Tanstack and AG Grid table
- What do we do about building with api routes? Ans: leave for now whilst we refactor the pipeline.
- direction of travel for next sprint
Direction of travel
- Need a "focus" near-term use case (of our own if possible) so that we can choose week to week work
- We have clear high level product vision
- However, still lots of directions we could go in right now (e.g. publish ui vs showcase, showcase from github source vs local etc)
- Need something to guide those choices
- Use case could be "data rich wiki/garden", specifically posting up the kind of stuff in https://github.com/datasets/awesome-data/issues e.g. here's a dataset about co2 concentrations i found
- The simplest way to start on that is to use the digital garden/wiki already in datahub-next in the
content
folder (and switch later to rendering remote) - This would imply near-term:
- Switching our render system from contentlayer and flowershow v1.2 (or even 0.9 which is what we actually using) to our new MDX+D pipeline
- Rebuilding our pipeline with all our learnings from Flowershow so far with these differences
- Render pipeline that works with the cloud and specifically remote content
- Data features e.g. table, graph etc
- New content layer (a way to load)
- Aside: You could see this as a convergence of Flowershow "Next" and DataHub "Next" 😄
Why move off "contentlayer.dev"
- tied to mdx-bundler (that we don't want and is limiting)
- ties too much stuff together in a non-clean way (it's more strictly than a contentlayer) e.g. has render, content loading, content schema
- local-only
- so-so query layer
Terminology
- Content layer
- Types / Schema
- Query API
- getItem/Blog/Document() => metadata, text
- getItems aka query
- Database
- Raw storage
- MetadataCache (or for querying)
- Loader
- Add metadata
- …
- Rendering pipeline
- Add metadata
Rufus brainstorm
Outflow
Intuition there is something in the scratch pad / wiki idea.
- #dontstopflow remember writing stuff about this back in the day for datahub v2. dig out the materials on hackmd and post 💤
what could we blog about day in, day out that would build our muscle.
What could we use this anger or for day to day stuff for?
Brainstorm
- For product comparisons e.g. what is best library for X?
- best javascript table library
- best cloud storage service
- publishing datasets (especially paid for)
- "intersting data" (stuff people search for on statista) e.g. iphone sales, gdp, etc
- Ecosystem map of our competitors / complementors e.g. quandl etc
- comparotron?? one interesting comparison a day.
- 🔥 wiki/garden for data stuff ie. i want to create pages about datasets, data questions, data stories and weave them together
- Examples:
- climate datasets: topical page, pages for specific datasets (start out just in the page?), questions etc.
- cloud storage pricing / providers etc
- catalog of interesting datasets
- displaying items from datasets/awesome
- Aside: this is sort of meta item that could incorporate many of the others
- Examples:
- Showcase the github.com/datasets
- may want to rework one into the form we would find most convenient.
- at minimum it is README + csv
A chart a day (keeps the doctor away)
A dataset a day (or week)
- What's the minimum?
- url (to source)
- 3 sentence description
- image/screengrab
- data table?
- bonus: graph etc
- bonus: ultra-small datasets we could almost enter by hand
Next steps
- Do I bite the bullet and start on the domain model or do we keep pushing the direct proxy approach?
- Guess it somewhat comes back to job stories …
- The term showcase was quite oriented to a dataset or a project where you are summarizing something. Less appropriate to simple rendering of a README.
- in any case README is not enough for project home page. (but why do we care about the extra stuff like project title, or the other stuff?)
- the making prominent of the README was a genius move of github and others.
Options
- Proper domain model Move to proper
Project
object and run off that- Why / Why not? not sure this does much for us atm
- Direct Proxy is we don't have anything in our database and directly pull content from github and render on the fly
- Switch the overall render system off contentlayer 🔥
- Why? that way we can have pages directly in repo that render data rich documents and starting implementing the wiki stuff
Options informed by a focal need
Data Projects vs Data Scratchpads
Moved to datahub-next/notes/jounal/2023-03-05
An alternative content layer
- what are the functions/APIs of contentlayer we use right now
- what would it take to replace/refactor them
What's the minimal API for "MarkdownDB"
TODO ...
Rufus Plan for the day
This is all great and what is immediate next steps?
Plan
2023-03-04 Moved to https://github.com/datopian/datahub-next/issues/10
Outflow
Moved into meeting notes above.
Inbox
- review materials for what we can port across to datahub-next ⏭️ ⏲️10m
- evaluate convergence of flowershow with datahub-next 💤