Planning Meeting - Rufus, Daniela, Rising, Anu

Planning Meeting - Rufus, Daniela, Rising, Anu

Present: Anu, Daniela, Rising, Rufus

Hypothesis Tree

#todo/integrate to ../ideas/pages

  • A product that takes data and content in git(hub) and provides an elegant customizable presentation on the web (including tables and graphs) is attractive to a growing set of users
    • It can be self-service and cloud-hosted
  • There is a demand for this
    • People are sharing data via tools like git and github
      • Code, data and documentation kept and versioned together is good
      • Got much more accessible at multiple levels in last few years
      • Much more data publishing publicly and privately
      • Growing set of data scientists and data engineers
        • who often wrangle data in their spare time
    • People experience the following pains
      • For data presentation and publishing there are major limits to github
        • Can’t display/browse large data files
          • Can’t store large files
        • Can’t build data-driven pages easily … (can’t integrate content and data)
          • Can’t create graphs, views etc that present insights of my data
        • Can’t create an API …
        • Minor: Code oriented (e.g. code statistics)
      • Aside: analogy with publishing a website: github readme (and file browser) is ok … but it isn’t a website …
  • The existing options for this aren’t great …
    • Build my own website and deploy with netlify etc
      • Requires a whole bunch of new skills esp if i want to present data
    • Use a notebook (e.g. jupyter, observable)
      • Pros: code, data and outputs integrated
      • Cons: Not very elegant, restrictive (python code, linear format), run some infrastructure (or use someone elses)
        • Observable: on a proprietary platform
    • Use a data publishing platform e.g. kaggle, zenodo, datahub.io etc
      • Code, data and content are separated …
      • Presentation is not very customizable
      • Proprietary platforms
    • Paste in a google spreadsheet and share it
      • Size limitations
      • Most of the target users aren’t doing this …
  • It is monetizable at X level …

Leo NTS

Interesting Technologies list

Writing formats

Re ../notes/data-literate-documents

From the point of view of the user writing a document must be easy, and also the format must present the possibility to export the document to at least html for a webpage view. Two formats are interesting MDX and Pandoc while MDX format can export md to html with dynamic behaviours having Pandoc work seems would be a great advantage. Nevertheless pandoc seems more complex to integrate in a react pipeline

  • MDX Great and simple, can handle extensions. This is good enough for the project.
  • Pandoc Like MDX on steroids that can export to hundreds of different formats including html, pdf, txt, docx. This seems great although support from node and javascript doesn't seem good. There is an example of static website generation with pandoc here

Sqlite3

SQLite3 databases access and manipulation from the browser

Spreadsheets

A short curated list of interesting technologies for in-browser spreadsheet visualization and manipulation.

Graphs and Plots

© 2024 All rights reservedBuilt with Find, Share and Publish Quality Data with Datahub

Built with Find, Share and Publish Quality Data with DatahubFind, Share and Publish Quality Data with Datahub