Planning Meeting - Rufus, Daniela, Rising, Anu
Planning Meeting - Rufus, Daniela, Rising, Anu
Present: Anu, Daniela, Rising, Rufus
Hypothesis Tree
#todo/integrate to ../areas/pages
- A product that takes data and content in git(hub) and provides an elegant customizable presentation on the web (including tables and graphs) is attractive to a growing set of users
- It can be self-service and cloud-hosted
- There is a demand for this
- People are sharing data via tools like git and github
- Code, data and documentation kept and versioned together is good
- Got much more accessible at multiple levels in last few years
- Much more data publishing publicly and privately
- Growing set of data scientists and data engineers
- who often wrangle data in their spare time
- People experience the following pains
- For data presentation and publishing there are major limits to github
- Can’t display/browse large data files
- Can’t store large files
- Can’t build data-driven pages easily … (can’t integrate content and data)
- Can’t create graphs, views etc that present insights of my data
- Can’t create an API …
- Minor: Code oriented (e.g. code statistics)
- Can’t display/browse large data files
- Aside: analogy with publishing a website: github readme (and file browser) is ok … but it isn’t a website …
- For data presentation and publishing there are major limits to github
- People are sharing data via tools like git and github
- The existing options for this aren’t great …
- Build my own website and deploy with netlify etc
- Requires a whole bunch of new skills esp if i want to present data
- Use a notebook (e.g. jupyter, observable)
- Pros: code, data and outputs integrated
- Cons: Not very elegant, restrictive (python code, linear format), run some infrastructure (or use someone elses)
- Observable: on a proprietary platform
- Use a data publishing platform e.g. kaggle, zenodo, datahub.io etc
- Code, data and content are separated …
- Presentation is not very customizable
- Proprietary platforms
- Paste in a google spreadsheet and share it
- Size limitations
- Most of the target users aren’t doing this …
- Build my own website and deploy with netlify etc
- It is monetizable at X level …
Leo NTS
Interesting Technologies list
Writing formats
Re ../notes/data-literate-documents
From the point of view of the user writing a document must be easy, and also the format must present the possibility to export the document to at least html for a webpage view. Two formats are interesting MDX
and Pandoc
while MDX format can export md to html with dynamic behaviours having Pandoc work seems would be a great advantage. Nevertheless pandoc seems more complex to integrate in a react pipeline
- MDX Great and simple, can handle extensions. This is good enough for the project.
- Pandoc Like MDX on steroids that can export to hundreds of different formats including html, pdf, txt, docx. This seems great although support from node and javascript doesn't seem good. There is an example of static website generation with pandoc here
Sqlite3
SQLite3 databases access and manipulation from the browser
- sql.js SQlite3 in javascript
- sql.js-httpvfs. Virtual File System over http to partially load sqlite3 files with sql.js. Doc here [here](https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/ there the post and here the repo)
- absurd-sql Extremely fast sqlite3 over indexeddb on the browser, extremely fast read-write local access. no remote access to another sqlite3 file though
Spreadsheets
A short curated list of interesting technologies for in-browser spreadsheet visualization and manipulation.
- A long list of spreadsheet javascript libraries For future reference if needed
- Handsontable
- SheetsJS
- React-data-grid a really great feature is grouping
- AGGrid (OS and paid version) Spreadsheets and integrated graphs there seems to be some open source libraries in their Github repo and their charts repo and documentation on charts. If the open source is enough I think this is the most complete options
- Material UI Data-Grid although not exactly spreadsheet display and manipulation could be used and the team is experienced in it.
Graphs and Plots
- Vega-Lite Datopian's preferred lightweigt and easy library. Cons: does not support WebGL
- Vega webgl renderer altho
- deck.gl Uber's WebGL powered visualization libraries