GitHub Auto-Sync

Summary

  • Situation: Currently, each time I make a change to my DataHub Cloud's site content repository, I need to go to my DataHub Cloud dashboard and manually "sync" my site.
  • Problem:
    • This is inconvenient if I frequently make changes to my site's content.
    • If I'm collaborating with others on a single site's content, the changes to it will only go live when the original site creator logs into their DataHub Cloud account and syncs the site manually.
  • Solution: Using GitHub webhooks registered in user sites' repos - optional feature, that can be switched on and off in site settings.
  • Appetite: 1d

Situation

At the moment, each time a user makes a change to the GitHub repository used as a base for his DataHub Cloud site, he needs to go to the DataHub Cloud dashboard and manually pull the changes by clicking the "Sync" button in the site settings page.

Problem

This is problematic because:

  • I need to remember to sync my site each time I make a change to the content repository, which may be cumbersome when the content is frequently changing.
  • It doesn't allow for full collaboration on my site with other people. Or it does, but only to an extent, i.e. we can collaborate on the repository content, but since the site has been created on my account, the changes will go live only after I "sync" them. And nobody else can do this but me.

Appetite

1-2d

Solution

Solution: Optional "auto-sync" feature leveraging GitHub webhooks.

How It Works:

  1. At the time a user toggles on the "auto-sync" feature in the site settings, the app, using the admin:repo_hook scope granted through OAuth, sets up a webhook on the underlying user’s repository using GitHub REST API endpoint.
  2. The webhook sends a POST request to a specified app endpoint whenever changes are pushed to the repository. The app's webhook handler processes the notification payload and updates files and metadata of any changed files if needed.

Processing event payload

See Appendix A with a full payload example.

How to tell if the push event is related to the branch configured with the site?

  • use ref payload field, e.g. "ref": "refs/heads/main", and ignore all the pushes to other branches

How to tell which files have been changed?

  • use head_commit payload field and its added, removed, and changed subfields which hold a list of file paths tested it and it won't work for rebased PRs, as there is no collective merge commit or "squashed" commit that would include all the changed files in a single head commit
  • use commits field, traverse it, and create a set of added, removed and changed files
    • note, a file can be removed and then added back (or the other way round) in a later commit of the same PR merged, so the algorithm needs to take this into account

What about the tree stored in the content store that has been used up until now in manual syncing?

  • the webhook event handler should update the tree as well for now, before the auto syncing feature is fully reliable and we get rid of the manual syncing option fully.

Refreshing GitHub token to include admin:repo-hook

Check X-OAuth-Scopes response header.

Option 1: Check it at site creation if we want to enable the auto-sync by default. Option 2: Check it when the user tries to toggle auto-sync on if we don't enable it by default.

If the header doesn't include admin:repo-hook, re-authenticate the user.


Useful links:

Rabbit holes

  • Reliability: If the webhook fails or there's a delay in the notification, the site might serve stale content until the issue is resolved. Ignore for now until we notice it being an issue. Also, we're not getting rid of manual sync option so if auto-sync fails, users can force sync manually.
  • High volume of webhook events: Implementing a queuing system to process webhook events asynchronously and scale the processing infrastructure if needed. Additionally, throttle webhook processing to prevent overloading the application. Start simple and leave it for the next iteration. Also, we're not getting rid of manual sync option so if auto-sync fails, users can force sync manually.

No-goes

Appendix A: Example webhook event payload

Headers:

**Request URL:** https://datahub.io/sync
**Request method:** POST
**Accept:** */*
**Content-Type:** application/json
**User-Agent:** GitHub-Hookshot/bd63eb1
**X-GitHub-Delivery:** bfb6588c-2417-11ef-9f5b-7c337224d4bd
**X-GitHub-Event:** push
**X-GitHub-Hook-ID:** 482690272
**X-GitHub-Hook-Installation-Target-ID:** 704146143
**X-GitHub-Hook-Installation-Target-Type:** repository

Payload:

{
  "ref": "refs/heads/main",
  "before": "b5c970ff98b2c65cd10f65ef2db0e69dbc32e982",
  "after": "29703105f28e4e2032a9aa14ff1f5c138574ecea",
  "repository": {
    "id": 704146143,
    "node_id": "R_kgDOKfhq3w",
    "name": "digital-garden",
    "full_name": "olayway/digital-garden",
    "private": false,
    "owner": {
      "name": "olayway",
      "email": "[email protected]",
      "login": "olayway",
      "id": 52197250,
      "node_id": "MDQ6VXNlcjUyMTk3MjUw",
      "avatar_url": "https://avatars.githubusercontent.com/u/52197250?v=4",
      "gravatar_id": "",
      "url": "https://api.github.com/users/olayway",
      "html_url": "https://github.com/olayway",
      "followers_url": "https://api.github.com/users/olayway/followers",
      "following_url": "https://api.github.com/users/olayway/following{/other_user}",
      "gists_url": "https://api.github.com/users/olayway/gists{/gist_id}",
      "starred_url": "https://api.github.com/users/olayway/starred{/owner}{/repo}",
      "subscriptions_url": "https://api.github.com/users/olayway/subscriptions",
      "organizations_url": "https://api.github.com/users/olayway/orgs",
      "repos_url": "https://api.github.com/users/olayway/repos",
      "events_url": "https://api.github.com/users/olayway/events{/privacy}",
      "received_events_url": "https://api.github.com/users/olayway/received_events",
      "type": "User",
      "site_admin": false
    },
    "html_url": "https://github.com/olayway/digital-garden",
    "description": "Testing Flowershow Cloud",
    "fork": false,
    "url": "https://github.com/olayway/digital-garden",
    "forks_url": "https://api.github.com/repos/olayway/digital-garden/forks",
    "keys_url": "https://api.github.com/repos/olayway/digital-garden/keys{/key_id}",
    "collaborators_url": "https://api.github.com/repos/olayway/digital-garden/collaborators{/collaborator}",
    "teams_url": "https://api.github.com/repos/olayway/digital-garden/teams",
    "hooks_url": "https://api.github.com/repos/olayway/digital-garden/hooks",
    "issue_events_url": "https://api.github.com/repos/olayway/digital-garden/issues/events{/number}",
    "events_url": "https://api.github.com/repos/olayway/digital-garden/events",
    "assignees_url": "https://api.github.com/repos/olayway/digital-garden/assignees{/user}",
    "branches_url": "https://api.github.com/repos/olayway/digital-garden/branches{/branch}",
    "tags_url": "https://api.github.com/repos/olayway/digital-garden/tags",
    "blobs_url": "https://api.github.com/repos/olayway/digital-garden/git/blobs{/sha}",
    "git_tags_url": "https://api.github.com/repos/olayway/digital-garden/git/tags{/sha}",
    "git_refs_url": "https://api.github.com/repos/olayway/digital-garden/git/refs{/sha}",
    "trees_url": "https://api.github.com/repos/olayway/digital-garden/git/trees{/sha}",
    "statuses_url": "https://api.github.com/repos/olayway/digital-garden/statuses/{sha}",
    "languages_url": "https://api.github.com/repos/olayway/digital-garden/languages",
    "stargazers_url": "https://api.github.com/repos/olayway/digital-garden/stargazers",
    "contributors_url": "https://api.github.com/repos/olayway/digital-garden/contributors",
    "subscribers_url": "https://api.github.com/repos/olayway/digital-garden/subscribers",
    "subscription_url": "https://api.github.com/repos/olayway/digital-garden/subscription",
    "commits_url": "https://api.github.com/repos/olayway/digital-garden/commits{/sha}",
    "git_commits_url": "https://api.github.com/repos/olayway/digital-garden/git/commits{/sha}",
    "comments_url": "https://api.github.com/repos/olayway/digital-garden/comments{/number}",
    "issue_comment_url": "https://api.github.com/repos/olayway/digital-garden/issues/comments{/number}",
    "contents_url": "https://api.github.com/repos/olayway/digital-garden/contents/{+path}",
    "compare_url": "https://api.github.com/repos/olayway/digital-garden/compare/{base}...{head}",
    "merges_url": "https://api.github.com/repos/olayway/digital-garden/merges",
    "archive_url": "https://api.github.com/repos/olayway/digital-garden/{archive_format}{/ref}",
    "downloads_url": "https://api.github.com/repos/olayway/digital-garden/downloads",
    "issues_url": "https://api.github.com/repos/olayway/digital-garden/issues{/number}",
    "pulls_url": "https://api.github.com/repos/olayway/digital-garden/pulls{/number}",
    "milestones_url": "https://api.github.com/repos/olayway/digital-garden/milestones{/number}",
    "notifications_url": "https://api.github.com/repos/olayway/digital-garden/notifications{?since,all,participating}",
    "labels_url": "https://api.github.com/repos/olayway/digital-garden/labels{/name}",
    "releases_url": "https://api.github.com/repos/olayway/digital-garden/releases{/id}",
    "deployments_url": "https://api.github.com/repos/olayway/digital-garden/deployments",
    "created_at": 1697127692,
    "updated_at": "2024-05-20T13:54:43Z",
    "pushed_at": 1717686988,
    "git_url": "git://github.com/olayway/digital-garden.git",
    "ssh_url": "[email protected]:olayway/digital-garden.git",
    "clone_url": "https://github.com/olayway/digital-garden.git",
    "svn_url": "https://github.com/olayway/digital-garden",
    "homepage": null,
    "size": 3883,
    "stargazers_count": 0,
    "watchers_count": 0,
    "language": "CSS",
    "has_issues": true,
    "has_projects": true,
    "has_downloads": true,
    "has_wiki": false,
    "has_pages": false,
    "has_discussions": false,
    "forks_count": 0,
    "mirror_url": null,
    "archived": false,
    "disabled": false,
    "open_issues_count": 0,
    "license": null,
    "allow_forking": true,
    "is_template": false,
    "web_commit_signoff_required": false,
    "topics": [

    ],
    "visibility": "public",
    "forks": 0,
    "open_issues": 0,
    "watchers": 0,
    "default_branch": "main",
    "stargazers": 0,
    "master_branch": "main"
  },
  "pusher": {
    "name": "olayway",
    "email": "[email protected]"
  },
  "sender": {
    "login": "olayway",
    "id": 52197250,
    "node_id": "MDQ6VXNlcjUyMTk3MjUw",
    "avatar_url": "https://avatars.githubusercontent.com/u/52197250?v=4",
    "gravatar_id": "",
    "url": "https://api.github.com/users/olayway",
    "html_url": "https://github.com/olayway",
    "followers_url": "https://api.github.com/users/olayway/followers",
    "following_url": "https://api.github.com/users/olayway/following{/other_user}",
    "gists_url": "https://api.github.com/users/olayway/gists{/gist_id}",
    "starred_url": "https://api.github.com/users/olayway/starred{/owner}{/repo}",
    "subscriptions_url": "https://api.github.com/users/olayway/subscriptions",
    "organizations_url": "https://api.github.com/users/olayway/orgs",
    "repos_url": "https://api.github.com/users/olayway/repos",
    "events_url": "https://api.github.com/users/olayway/events{/privacy}",
    "received_events_url": "https://api.github.com/users/olayway/received_events",
    "type": "User",
    "site_admin": false
  },
  "created": false,
  "deleted": false,
  "forced": false,
  "base_ref": null,
  "compare": "https://github.com/olayway/digital-garden/compare/b5c970ff98b2...29703105f28e",
  "commits": [
    {
      "id": "29703105f28e4e2032a9aa14ff1f5c138574ecea",
      "tree_id": "d203f80a690c54083c351556a31eaa90544e1317",
      "distinct": true,
      "message": "Update README.md",
      "timestamp": "2024-06-06T17:16:28+02:00",
      "url": "https://github.com/olayway/digital-garden/commit/29703105f28e4e2032a9aa14ff1f5c138574ecea",
      "author": {
        "name": "Ola Rubaj",
        "email": "[email protected]",
        "username": "olayway"
      },
      "committer": {
        "name": "GitHub",
        "email": "[email protected]",
        "username": "web-flow"
      },
      "added": [

      ],
      "removed": [

      ],
      "modified": [
        "README.md"
      ]
    }
  ],
  "head_commit": {
    "id": "29703105f28e4e2032a9aa14ff1f5c138574ecea",
    "tree_id": "d203f80a690c54083c351556a31eaa90544e1317",
    "distinct": true,
    "message": "Update README.md",
    "timestamp": "2024-06-06T17:16:28+02:00",
    "url": "https://github.com/olayway/digital-garden/commit/29703105f28e4e2032a9aa14ff1f5c138574ecea",
    "author": {
      "name": "Ola Rubaj",
      "email": "[email protected]",
      "username": "olayway"
    },
    "committer": {
      "name": "GitHub",
      "email": "[email protected]",
      "username": "web-flow"
    },
    "added": [

    ],
    "removed": [

    ],
    "modified": [
      "README.md"
    ]
  }
}

© 2024 All rights reservedBuilt with Find, Share and Publish Quality Data with Datahub

Built with Find, Share and Publish Quality Data with DatahubFind, Share and Publish Quality Data with Datahub