GitHub Auto-Sync
Summary
- Situation: Currently, each time I make a change to my DataHub Cloud's site content repository, I need to go to my DataHub Cloud dashboard and manually "sync" my site.
- Problem:
- This is inconvenient if I frequently make changes to my site's content.
- If I'm collaborating with others on a single site's content, the changes to it will only go live when the original site creator logs into their DataHub Cloud account and syncs the site manually.
- Solution: Using GitHub webhooks registered in user sites' repos - optional feature, that can be switched on and off in site settings.
- Appetite: 1d
Situation
At the moment, each time a user makes a change to the GitHub repository used as a base for his DataHub Cloud site, he needs to go to the DataHub Cloud dashboard and manually pull the changes by clicking the "Sync" button in the site settings page.
Problem
This is problematic because:
- I need to remember to sync my site each time I make a change to the content repository, which may be cumbersome when the content is frequently changing.
- It doesn't allow for full collaboration on my site with other people. Or it does, but only to an extent, i.e. we can collaborate on the repository content, but since the site has been created on my account, the changes will go live only after I "sync" them. And nobody else can do this but me.
Appetite
1-2d
Solution
Solution: Optional "auto-sync" feature leveraging GitHub webhooks.
How It Works:
- At the time a user toggles on the "auto-sync" feature in the site settings, the app, using the
admin:repo_hook
scope granted through OAuth, sets up a webhook on the underlying user’s repository using GitHub REST API endpoint. - The webhook sends a POST request to a specified app endpoint whenever changes are pushed to the repository. The app's webhook handler processes the notification payload and updates files and metadata of any changed files if needed.
Processing event payload
See Appendix A with a full payload example.
How to tell if the push event is related to the branch configured with the site?
- use
ref
payload field, e.g."ref": "refs/heads/main"
, and ignore all the pushes to other branches
How to tell which files have been changed?
usetested it and it won't work for rebased PRs, as there is no collective merge commit or "squashed" commit that would include all the changed files in a single head commithead_commit
payload field and itsadded
,removed
, andchanged
subfields which hold a list of file paths- use
commits
field, traverse it, and create a set of added, removed and changed files- note, a file can be removed and then added back (or the other way round) in a later commit of the same PR merged, so the algorithm needs to take this into account
What about the tree stored in the content store that has been used up until now in manual syncing?
- the webhook event handler should update the tree as well for now, before the auto syncing feature is fully reliable and we get rid of the manual syncing option fully.
Refreshing GitHub token to include admin:repo-hook
Check X-OAuth-Scopes response header.
Option 1: Check it at site creation if we want to enable the auto-sync by default. Option 2: Check it when the user tries to toggle auto-sync on if we don't enable it by default.
If the header doesn't include admin:repo-hook
, re-authenticate the user.
Useful links:
- Testing GitHub webhooks: https://docs.github.com/en/webhooks/testing-and-troubleshooting-webhooks/testing-webhooks
- Creating GitHub webhooks: https://docs.github.com/en/webhooks/using-webhooks/creating-webhooks#creating-a-repository-webhook
- REST endpoint for creating webhooks: https://docs.github.com/en/rest/repos/webhooks?apiVersion=2022-11-28#create-a-repository-webhook
Rabbit holes
- Reliability: If the webhook fails or there's a delay in the notification, the site might serve stale content until the issue is resolved. Ignore for now until we notice it being an issue. Also, we're not getting rid of manual sync option so if auto-sync fails, users can force sync manually.
- High volume of webhook events: Implementing a queuing system to process webhook events asynchronously and scale the processing infrastructure if needed. Additionally, throttle webhook processing to prevent overloading the application. Start simple and leave it for the next iteration. Also, we're not getting rid of manual sync option so if auto-sync fails, users can force sync manually.
No-goes
…
Appendix A: Example webhook event payload
Headers:
**Request URL:** https://datahub.io/sync
**Request method:** POST
**Accept:** */*
**Content-Type:** application/json
**User-Agent:** GitHub-Hookshot/bd63eb1
**X-GitHub-Delivery:** bfb6588c-2417-11ef-9f5b-7c337224d4bd
**X-GitHub-Event:** push
**X-GitHub-Hook-ID:** 482690272
**X-GitHub-Hook-Installation-Target-ID:** 704146143
**X-GitHub-Hook-Installation-Target-Type:** repository
Payload:
{
"ref": "refs/heads/main",
"before": "b5c970ff98b2c65cd10f65ef2db0e69dbc32e982",
"after": "29703105f28e4e2032a9aa14ff1f5c138574ecea",
"repository": {
"id": 704146143,
"node_id": "R_kgDOKfhq3w",
"name": "digital-garden",
"full_name": "olayway/digital-garden",
"private": false,
"owner": {
"name": "olayway",
"email": "[email protected]",
"login": "olayway",
"id": 52197250,
"node_id": "MDQ6VXNlcjUyMTk3MjUw",
"avatar_url": "https://avatars.githubusercontent.com/u/52197250?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/olayway",
"html_url": "https://github.com/olayway",
"followers_url": "https://api.github.com/users/olayway/followers",
"following_url": "https://api.github.com/users/olayway/following{/other_user}",
"gists_url": "https://api.github.com/users/olayway/gists{/gist_id}",
"starred_url": "https://api.github.com/users/olayway/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/olayway/subscriptions",
"organizations_url": "https://api.github.com/users/olayway/orgs",
"repos_url": "https://api.github.com/users/olayway/repos",
"events_url": "https://api.github.com/users/olayway/events{/privacy}",
"received_events_url": "https://api.github.com/users/olayway/received_events",
"type": "User",
"site_admin": false
},
"html_url": "https://github.com/olayway/digital-garden",
"description": "Testing Flowershow Cloud",
"fork": false,
"url": "https://github.com/olayway/digital-garden",
"forks_url": "https://api.github.com/repos/olayway/digital-garden/forks",
"keys_url": "https://api.github.com/repos/olayway/digital-garden/keys{/key_id}",
"collaborators_url": "https://api.github.com/repos/olayway/digital-garden/collaborators{/collaborator}",
"teams_url": "https://api.github.com/repos/olayway/digital-garden/teams",
"hooks_url": "https://api.github.com/repos/olayway/digital-garden/hooks",
"issue_events_url": "https://api.github.com/repos/olayway/digital-garden/issues/events{/number}",
"events_url": "https://api.github.com/repos/olayway/digital-garden/events",
"assignees_url": "https://api.github.com/repos/olayway/digital-garden/assignees{/user}",
"branches_url": "https://api.github.com/repos/olayway/digital-garden/branches{/branch}",
"tags_url": "https://api.github.com/repos/olayway/digital-garden/tags",
"blobs_url": "https://api.github.com/repos/olayway/digital-garden/git/blobs{/sha}",
"git_tags_url": "https://api.github.com/repos/olayway/digital-garden/git/tags{/sha}",
"git_refs_url": "https://api.github.com/repos/olayway/digital-garden/git/refs{/sha}",
"trees_url": "https://api.github.com/repos/olayway/digital-garden/git/trees{/sha}",
"statuses_url": "https://api.github.com/repos/olayway/digital-garden/statuses/{sha}",
"languages_url": "https://api.github.com/repos/olayway/digital-garden/languages",
"stargazers_url": "https://api.github.com/repos/olayway/digital-garden/stargazers",
"contributors_url": "https://api.github.com/repos/olayway/digital-garden/contributors",
"subscribers_url": "https://api.github.com/repos/olayway/digital-garden/subscribers",
"subscription_url": "https://api.github.com/repos/olayway/digital-garden/subscription",
"commits_url": "https://api.github.com/repos/olayway/digital-garden/commits{/sha}",
"git_commits_url": "https://api.github.com/repos/olayway/digital-garden/git/commits{/sha}",
"comments_url": "https://api.github.com/repos/olayway/digital-garden/comments{/number}",
"issue_comment_url": "https://api.github.com/repos/olayway/digital-garden/issues/comments{/number}",
"contents_url": "https://api.github.com/repos/olayway/digital-garden/contents/{+path}",
"compare_url": "https://api.github.com/repos/olayway/digital-garden/compare/{base}...{head}",
"merges_url": "https://api.github.com/repos/olayway/digital-garden/merges",
"archive_url": "https://api.github.com/repos/olayway/digital-garden/{archive_format}{/ref}",
"downloads_url": "https://api.github.com/repos/olayway/digital-garden/downloads",
"issues_url": "https://api.github.com/repos/olayway/digital-garden/issues{/number}",
"pulls_url": "https://api.github.com/repos/olayway/digital-garden/pulls{/number}",
"milestones_url": "https://api.github.com/repos/olayway/digital-garden/milestones{/number}",
"notifications_url": "https://api.github.com/repos/olayway/digital-garden/notifications{?since,all,participating}",
"labels_url": "https://api.github.com/repos/olayway/digital-garden/labels{/name}",
"releases_url": "https://api.github.com/repos/olayway/digital-garden/releases{/id}",
"deployments_url": "https://api.github.com/repos/olayway/digital-garden/deployments",
"created_at": 1697127692,
"updated_at": "2024-05-20T13:54:43Z",
"pushed_at": 1717686988,
"git_url": "git://github.com/olayway/digital-garden.git",
"ssh_url": "[email protected]:olayway/digital-garden.git",
"clone_url": "https://github.com/olayway/digital-garden.git",
"svn_url": "https://github.com/olayway/digital-garden",
"homepage": null,
"size": 3883,
"stargazers_count": 0,
"watchers_count": 0,
"language": "CSS",
"has_issues": true,
"has_projects": true,
"has_downloads": true,
"has_wiki": false,
"has_pages": false,
"has_discussions": false,
"forks_count": 0,
"mirror_url": null,
"archived": false,
"disabled": false,
"open_issues_count": 0,
"license": null,
"allow_forking": true,
"is_template": false,
"web_commit_signoff_required": false,
"topics": [
],
"visibility": "public",
"forks": 0,
"open_issues": 0,
"watchers": 0,
"default_branch": "main",
"stargazers": 0,
"master_branch": "main"
},
"pusher": {
"name": "olayway",
"email": "[email protected]"
},
"sender": {
"login": "olayway",
"id": 52197250,
"node_id": "MDQ6VXNlcjUyMTk3MjUw",
"avatar_url": "https://avatars.githubusercontent.com/u/52197250?v=4",
"gravatar_id": "",
"url": "https://api.github.com/users/olayway",
"html_url": "https://github.com/olayway",
"followers_url": "https://api.github.com/users/olayway/followers",
"following_url": "https://api.github.com/users/olayway/following{/other_user}",
"gists_url": "https://api.github.com/users/olayway/gists{/gist_id}",
"starred_url": "https://api.github.com/users/olayway/starred{/owner}{/repo}",
"subscriptions_url": "https://api.github.com/users/olayway/subscriptions",
"organizations_url": "https://api.github.com/users/olayway/orgs",
"repos_url": "https://api.github.com/users/olayway/repos",
"events_url": "https://api.github.com/users/olayway/events{/privacy}",
"received_events_url": "https://api.github.com/users/olayway/received_events",
"type": "User",
"site_admin": false
},
"created": false,
"deleted": false,
"forced": false,
"base_ref": null,
"compare": "https://github.com/olayway/digital-garden/compare/b5c970ff98b2...29703105f28e",
"commits": [
{
"id": "29703105f28e4e2032a9aa14ff1f5c138574ecea",
"tree_id": "d203f80a690c54083c351556a31eaa90544e1317",
"distinct": true,
"message": "Update README.md",
"timestamp": "2024-06-06T17:16:28+02:00",
"url": "https://github.com/olayway/digital-garden/commit/29703105f28e4e2032a9aa14ff1f5c138574ecea",
"author": {
"name": "Ola Rubaj",
"email": "[email protected]",
"username": "olayway"
},
"committer": {
"name": "GitHub",
"email": "[email protected]",
"username": "web-flow"
},
"added": [
],
"removed": [
],
"modified": [
"README.md"
]
}
],
"head_commit": {
"id": "29703105f28e4e2032a9aa14ff1f5c138574ecea",
"tree_id": "d203f80a690c54083c351556a31eaa90544e1317",
"distinct": true,
"message": "Update README.md",
"timestamp": "2024-06-06T17:16:28+02:00",
"url": "https://github.com/olayway/digital-garden/commit/29703105f28e4e2032a9aa14ff1f5c138574ecea",
"author": {
"name": "Ola Rubaj",
"email": "[email protected]",
"username": "olayway"
},
"committer": {
"name": "GitHub",
"email": "[email protected]",
"username": "web-flow"
},
"added": [
],
"removed": [
],
"modified": [
"README.md"
]
}
}