Standup / Sprint planning 2023-03-03

Standup / Sprint planning 2023-03-03

#todo/process

Present: Ola, Joao, Rufus, Khalil

Next steps

  • Ola
    • finishing table analysis e.g. for each item (key points added to issue)
    • Ola: reading up on sanity.io and making notes especially about API
      • documents
      • assets (i.e. images, data files)
      • bonus: for a call name for our "content layer" app
  • Joao:
    • graphs finished
    • analysing what would be needed to replace the render layer of "Flowershow"
      • open issue about what is needed to get new render pipeline working (DataHub / Flowershow Next) i.e. without filesystem dependencies etc

Agenda

  • Rufus comment re table feature analysis
    • TODOs: a list (preferably, instead of table) of features that I/we think we want to have + screenshots of them "in action" for both Tanstack and AG Grid table
  • What do we do about building with api routes? Ans: leave for now whilst we refactor the pipeline.
  • direction of travel for next sprint

Direction of travel

  • Need a "focus" near-term use case (of our own if possible) so that we can choose week to week work
    • We have clear high level product vision
    • However, still lots of directions we could go in right now (e.g. publish ui vs showcase, showcase from github source vs local etc)
    • Need something to guide those choices
  • Use case could be "data rich wiki/garden", specifically posting up the kind of stuff in https://github.com/datasets/awesome-data/issues e.g. here's a dataset about co2 concentrations i found
  • The simplest way to start on that is to use the digital garden/wiki already in datahub-next in the content folder (and switch later to rendering remote)
  • This would imply near-term:
    • Switching our render system from contentlayer and flowershow v1.2 (or even 0.9 which is what we actually using) to our new MDX+D pipeline
    • Rebuilding our pipeline with all our learnings from Flowershow so far with these differences
    • Render pipeline that works with the cloud and specifically remote content
    • Data features e.g. table, graph etc
    • New content layer (a way to load)
    • Aside: You could see this as a convergence of Flowershow "Next" and DataHub "Next" 😄

Why move off "contentlayer.dev"

  • tied to mdx-bundler (that we don't want and is limiting)
  • ties too much stuff together in a non-clean way (it's more strictly than a contentlayer) e.g. has render, content loading, content schema
  • local-only
  • so-so query layer

Terminology

  • Content layer
    • Types / Schema
    • Query API
      • getItem/Blog/Document() => metadata, text
      • getItems aka query
    • Database
      • Raw storage
      • MetadataCache (or for querying)
    • Loader
      • Add metadata
  • Rendering pipeline
    • Add metadata

Rufus brainstorm

Outflow

Intuition there is something in the scratch pad / wiki idea.

  • #dontstopflow remember writing stuff about this back in the day for datahub v2. dig out the materials on hackmd and post 💤

what could we blog about day in, day out that would build our muscle.

What could we use this anger or for day to day stuff for?

Brainstorm

  • For product comparisons e.g. what is best library for X?
    • best javascript table library
    • best cloud storage service
  • publishing datasets (especially paid for)
  • "intersting data" (stuff people search for on statista) e.g. iphone sales, gdp, etc
  • Ecosystem map of our competitors / complementors e.g. quandl etc
  • comparotron?? one interesting comparison a day.
  • 🔥 wiki/garden for data stuff ie. i want to create pages about datasets, data questions, data stories and weave them together
    • Examples:
      • climate datasets: topical page, pages for specific datasets (start out just in the page?), questions etc.
      • cloud storage pricing / providers etc
      • catalog of interesting datasets
      • displaying items from datasets/awesome
    • Aside: this is sort of meta item that could incorporate many of the others
  • Showcase the github.com/datasets
    • may want to rework one into the form we would find most convenient.
    • at minimum it is README + csv

A chart a day (keeps the doctor away)

A dataset a day (or week)

  • What's the minimum?
    • url (to source)
    • 3 sentence description
    • image/screengrab
    • data table?
  • bonus: graph etc
  • bonus: ultra-small datasets we could almost enter by hand

Next steps

  • Do I bite the bullet and start on the domain model or do we keep pushing the direct proxy approach?
    • Guess it somewhat comes back to job stories …
  • The term showcase was quite oriented to a dataset or a project where you are summarizing something. Less appropriate to simple rendering of a README.
    • in any case README is not enough for project home page. (but why do we care about the extra stuff like project title, or the other stuff?)
    • the making prominent of the README was a genius move of github and others.

Options

  • Proper domain model Move to proper Project object and run off that
    • Why / Why not? not sure this does much for us atm
  • Direct Proxy is we don't have anything in our database and directly pull content from github and render on the fly
  • Switch the overall render system off contentlayer 🔥
    • Why? that way we can have pages directly in repo that render data rich documents and starting implementing the wiki stuff

Options informed by a focal need

Data Projects vs Data Scratchpads

Moved to datahub-next/notes/jounal/2023-03-05

An alternative content layer

  • what are the functions/APIs of contentlayer we use right now
  • what would it take to replace/refactor them

What's the minimal API for "MarkdownDB"

TODO ...

Rufus Plan for the day

This is all great and what is immediate next steps?

Plan

2023-03-04 Moved to https://github.com/datopian/datahub-next/issues/10

Outflow

Moved into meeting notes above.

Inbox

  • review materials for what we can port across to datahub-next ⏭️ ⏲️10m
  • evaluate convergence of flowershow with datahub-next 💤

© 2024 All rights reservedBuilt with DataHub Cloud

Built with DataHub CloudDataHub Cloud