Rufus notes

Rufus notes

Outflow

  • Be useful to have a map of products and how they connect / overlap
    • Then some analysis of which to prioritize
  • Want examples on the site
  • Want a demo / trial without signing up?
  • 🧊 Add a marketplace section to capture all the people who are coming for data …
  • What's the experience we want
  • Find all the previous write-ups of UX options and post them …
  • Create a clear ⏭️ next list

Finishing the storage layer design

Diagram in excalidraw showing …

Explanation of how we copy from source

  • Source:
    • git(hub)
    • Local on disk

Why can't we use github as our direct storage layer?

Github is unsuitable primarily b/c it has an API limit on accessing files which is very low ~5k/h. We hit the storage layer for every read request for a page (perhaps multiple times).

In addition:

  • we need additional storage anyway for computed material etc etc. In this case we may as well have one consolidated place for storage.
  • for large files github would work poorly or not all (even for e.g. image files)
  • one single storage layer no matter what the original source (one day we have support sources other than github)
  • permissions and processing may be simpler we just need user to give read access to copy over once and don't need for every anonymous read - in essence, we can separate our DataHub permissions from github permissions more cleanly).
  • using our storage layer is probably faster (r2 is close to the edge, we don't through github's api layer etc)

Why not use project database for all content?

  • Don't want to store large filees in database
  • So may as well not store all content files in database for consistency
  • Cleaner to have database "rebuildable" (see database as an index rather than source of truth)
    • ASIDE: do we store project info file into project with its owner (that would be cool)

Current sequence from github to storage layer for a request

Have an architectural separation between "import/sync" of data/content into storage layer and then read from it …

  • Copy from github into storage layer
    • Raw-ish copy of files and the tree info
    • May do some additional processing e.g. adding metadata from markdowndb

Contrast this with the simple design

  • Request for @me/my-project/myfile => app => app requests source file from github => app renders it

Extra things to discuss

  • Computing stuff …
  • Indexing stuff …

Markdown based product ideas

  • Here was sketch and notes from march 2023 - datahub-next-direction-march-2023
    • Worth re-reading
  • Discussion issue with David Gasquez (could turn into a post) #todo find that

List

  • Data Project
  • Data Story
  • Data Scratchpad
  • Markdown-based wiki
  • Markdown-based blog
  • Markdown-based website
  • Mardkwon-based single page site (home page)
  • Simple visualization app

What should we do for each of these?

What criteria do we evaluate with?

Stuff for David Gasquez

  • Organize a regular chat. use the chat to drive a write-up
  • Organize a short free course (and use the course to drive)

Stuff to post from DataHub v2 days …

  • Old deck
  • Old SCQA
  • Various notes about Data Experience vs Developer Experience

© 2024 All rights reserved

Built with DataHub LogoDataHub Cloud