What notes/vercel|Vercel is doing with git integration and how can we learn from it in DataHub?
anu
What Vercel is doing with git integration and how can we learn from it in DataHub?
Introduction
One of the main visions for DataHub Next is a Git integration, in particular GitHub (and then maybe Gitlab etc.). What we’re thinkingtrying to achieve is quite similar to what Vercel is doing with app deployments so let’s consider how exactly they’re doing Git integration.
Docs about it: https://vercel.com/docs/git (below is the summary from that page):
- Automatic deployments on every branch push - might be a cool feature on DataHub as well, eg, having different branches of your dataset on GitHub => deployed to DataHub. Also, you can have “staging” version deployed to DataHub before going to “production”.
- The easiest way to use Git is to think of your main branch as production. Every time a pull/merge request is made to that branch, Vercel will create a unique deployment, allowing you to view the changes in a preview environment before merging.
- Preview deployments - might not be necessary in early versions but something we can consider later on. Note that we have tried something similar with DataHub v2 with Data Desktop app.
- Instant rollbacks - something cool but not so essential as you can git commit a rollback.
Getting started flow
- User goes to DataHub and signs up (with Github only? If so maybe we can get authorization for users repos here), visits user’s profile/dashboard page where GitHub sync is prompted:
- User is redirected to GitHub OAuth dialogue to provide required authorization for DataHub.
- The profile/dashboard page loads with a list of available repositories. Here, users can select projects to deploy to DataHub.
Templates/quickstart
Create a repo with sample datapackage.json and data file (eg, CSV, XLSX or GeoJSON). For example:
Towards version 1.0.0
Custom domains
Monorepos
Deploying multiple datasets (apps) from single repo: https://vercel.com/docs/git#monorepos**