DataHub.io
DataHub.io
DataHub.io: the product and plans for it.
Definitions
- DataHub.io: the production site running at DataHub.io
- DataHub(.io) v2 (aka DataHub Legacy): codebase that is running this. See https://tech.datopian.com/datahub/ for details (not that important to look at!)
High level plan
- What: keep DataHub.io and upgrade it.
- Near-term (to mid-late 2023): make best use of its SEO etc for marketing, fixing or removing glaring issues.
- Medium term (from mid-late 2023): migrate it to our new DMS "enterprise" platform as the "consumer" version (as github.com or gitlab.com are consumer SaaS version of their enterprise version)
- How:
- Near-term: detailed plan of work in https://github.com/datopian/product/issues/61. In short:
- incrementally replace/migrate high value/cost parts (e.g. home page, docs etc) with a simple nextjs site
- fix high value/cost parts (e.g. search)
- prep for migration e.g. shut down signup and communicate with users.
- Plan of work:
- Medium-term: ⛔2023-01-30 not worked out yet
- Near-term: detailed plan of work in https://github.com/datopian/product/issues/61. In short:
Future of DataHub.io: keep and upgrade it
Subject: future of DataHub.io
Question: what are we going to do with DataHub.io
Hypothesis: we are going to keep DataHub.io and upgrade it. Near-term making best use of its SEO etc for marketing. Medium term we are going to make it functional and ultimately migrate it to our new DMS "enterprise" platform as the "consumer" version (as github.com or gitlab.com are consumer SaaS version of their enterprise version)
- DataHub.io working is important
- It reflects on Datopian generally
- It is a demonstrator of our product and skills
- Especially on our Enterprise offering
- We can use DataHub.io for marketing
- DataHub.io is a highly trafficked site with good SEO
- We can fix and/or use DataHub.io near-term for marketing reasonably easily
- Incremental replacement with a new app is possible
Situation:
- datahub.io is our longest running online property (it is the evolution of the original ckan.net launched in 2006)
- it is highest traffic website
- it is running what we term "DataHub v2" which was developed in 2015-2017 as a next generation DMS SaaS. It was a microservice architecture. Documented at https://tech.datopian.com/datahub/
- this platform has not been developed since ~2017/2018 and has only been minimally maintained.
- it has ?? users (update: 2023-01-29 18k registered!)
Complication:
- DataHub.io has not been working well for several years
- e.g. login not working or very slow already in late 2019
- search no longer seems to be working
- Codebase was built largely by Adam who is no longer around
- The backend code (data pipelines) are probably not working at all
How? Medium term move to Enterprise whilst near-term doing incremental take over plus fixes
Subject: maintaining and upgrading datahub.io
Question: how are we going to fix and upgrade DataHub.io?
Answer: moving to DataHub enterprise medium-term. However that could take time and be complex, so rather than block on this we can near-term a) incrementally replace/migrate high value/cost parts (e.g. home page, docs etc) with a simple nextjs site b) fix high value/cost parts (e.g. search) c) prep for migration e.g. shut down signup and communicate with users.
- Moving to DataHub enterprise is good (best) medium-term approach
- Feature sets are similar or identical in core parts (only difference might be billing/restrictions)
- DataHub enterprise will be functional soon
- We have no better technical options
- The github based model is something we have explored and tested for years
- Upgrading to enterprise is going to take some time
- Enterprise needs to be production ready for use on DAtaHub.io
- This at least 6m off as of 2023-01-30
- Even when Enterprise is ready migration may take some time (if we plan to migrate old data?)
- Incremental take-over is a good idea
- "Incremental take-over" can work
- we can reverse proxy easily (via cloudflare workers)
- we can build a similarly themed static site (using nextjs/flowershow)
- allows us control of key SEO assets e.g. front page, blog, other sub pages quickly and easily
- "Incremental take-over" can work
- We can fix or hide other issues (hard to replace with static site)
- We can disable login
- We can address search
- Either we can fix current issue
- Or we could take this over crudely in nextjs (e.g. typesearch + nextjs using existing enterprise codebase)
- We can communicate with users to reduce any -ve experience and prep for upgrade
- We can get a dump of the user database
- We can email existing users
Tasks (for near-term work)
Incremental take-over
Fix DataHub.io
- Assess state of DataHub.io
- Obtain backups of key info (both for backup purposes and for use in comms)
- Remove login/signup (unfortunately login and signup are one and the same IIRC)
Comms
- Email people about what we are doing