Paul Walsh + Rufus Pollock - 2023-05-02
Paul Walsh + Rufus Pollock - 2023-05-02
Intention: get Paul's feedback on overview and pitch for these
- Is it clear what they are?
- Who would they be valuable to and why (what pain do they solve)?
- Would people pay for them and how much?
Secondarily
- Can we simplify (a lot for people to grok)
- How they fit together …
Agenda
- 👋 Check-in
- 🚩 Sharing in confidence (sure that's fine)
- 📝 Feedback on products and product roadmap
- Overview - PDF
- PortalJS
- DataHub ✅2023-05-02 paul most liked the idea of "data clipper" (i.e. bookmarking data type stuff), a core place for people to find data which combine data itself and community Q&A (noted that many data engineers and scientists struggled to find and contextualise data online)
- 🤔 AOB
- Flowershow: framework for publishing markdown-based websites especially digital gardens and data-rich content ✅2023-05-03 ❌ did not get to
- MarkdownDB - https://github.com/datopian/markdowndb ✅2023-05-03 ❌ ditto
- https://lifeitself.org/podcast/a-scientific-approach-to-awakening-and-fundamental-wellbeing-podcast
- https://diginomics.io - https://diginomics.io/guide/
- 💪 How can i serve or contribute to you?
- ⏭️ Next steps and check-out
PortalJS
Rapidly build rich data portals using a modern frontend framework.
DataHub
Subject: future of DataHub
Q: What should DataHub become? What should its core offer be, to whom?
A: DataHub is a club for … passionate/obsessive data librarians/ curators 😉
People who love data and the process of data.
People who want to curate and share important, intriguing or even just "may-someday-be-interesting" data.
- Like the process and chatting about the process
- People who love data 😉
Process = tools, techniques, tips for collecting, curating and presenting data.
S: DataHub is inactive in terms of development and users
C: It's costing us money and it looks bad to have something semi-broken. And, on the other side of things: it could be a valuable asset e.g. it has a lot of traffic, nice brand.
SCQA brainstorm
S:
- DataHub been inactive in terms of development etc since 2018.
- Even broken last 1-2y e.g. you couldn't really upload
- Never had a revenue model
- Costing around $1k a month to run
- Quite a lot of traffic (1m visits a year)
- Core thesis: make it easy to present data (and find it)
- Reworked in 2021: make it easy to share - https://datahub.io/notes/vision
- Even more focused in Q1 2023: turn github into a datahub
- Core thesis a little muddy: was it finding data (aka data marketplace) or publishing/sharing data i.e. github for data
- Marketplace was tried and did not really work (though not tried that hard). Big concern was relative cost of getting data. Lots of stuff out there that has tried that with mixed success (and somehow not super-inspiring to me at least – did i want to be quandl and exit to nasdaq etc)
- What always interested Rufus was the "democratising data" part of this: wanted to create of community that could build a data commons. that's what ckan.net was really about at the start.
C:
- Github for Data or even more generally B2C cloud "data publishing" is not viable because of some combination of:
- Thesis is wrong: no-one has ever cracked github for data … and for good reason.
- Already good options: Turn github into a datahub begs the question … why not just use github? (especially when there's https://blocks.githubnext.com)
- Can't execute:
- Argument 1: What's the value add? Especially one people might pay for?
- Data APIs? This requires significant infrastructure to build and maintain. It is definitely a possibility but would need to come later.
- Can imagine more stuff e.g. connectors to sinks or sources e.g. push to datahub and automatically publish a google spreadsheet etc. Again though would have to pretty slick to compete with doing it yourself and not clear on breadth of market for a given need
- But … Lot of stuff out there: why are we different? Why aren't they succeeding?
- And do we have the capacity to execute at quality needed? Probably not …
- Argument 2: (Thesis is wrong) No-one has ever really cracked the "github for data"
- There's a reason for that: Github had git …
- Git was a killer app and it is a "network" app: you share with others and hence has that magical virality and social-ness (and coding in general is social)
- no equivalent for data (dolthub etc are trying but don't think they'll succeed)
- It's the community stupid (or the tool)
…
But what if i believe deep down "there's gold out there in dem thar hills …"
- Keep the fire burning
- Gather others who share a passion
- Keep experimenting
- Who knows … maybe …
- We can create a community-based Statista (or even Bloomberg)
- We can create a member-based SaaS data API provider
- Unsplash for data
Starting point
"unsplash for data": publish one dataset a week that we post on reddit/r/datasets etc with a nice graph.
Post dataset on github.com/datasets
Business model
- Selling data (long-term)
- Chargeable API
- Data bounties and data scraper community
Appendix: Options for DataHub
TODO: Is this necessary?
Do i need to discuss previous discussed options re DataHub e.g. scratchpad etc?
- Data Projects: a bit like Github for data.
- Data Publishing: Showcase for your data (intersects with previous)
- Data Scratchpad (dumping):
- Data Curating: be part of a club
- Data Pinterest: pin interesting vis, datasets etc