Published March 3, 2026

Dataset Freshness Check Runbook and Results

0

This workspace was used to run agentic freshness checks across dataset repositories under . - Built and ran to process repositories listed in markdown tables. - For each repo, the checker: - rea...

Dataset Freshness Check Runbook and Results

This workspace was used to run agentic freshness checks across dataset repositories under datasets/.

What Was Done

Ran an agentic workflow for each one of the repos
For each repo, the checker:
- reads repo metadata (README, datapackage, scripts),
- extracts candidate upstream source URLs,
- probes/fetches upstream endpoints,
- computes latest local date from local data files,
- compares local vs inferred upstream recency,
- appends a row to freshness-report.csv.

Output Files

Consolidated report: freshness-report.csv

CSV Schema

freshness-report.csv follows this structure:

repo_name,readme_location,datapackage_location,scripts_location,description,latest_local_date,latest_upstream_date,is_stale,staleness_reason,status

Notes and Caveats

Checks are source-driven (not based on last git update date).
Some datasets are flagged stale because upstream endpoints are inaccessible (401/403/404, redirects, or network timeouts).
Some rows have empty latest_upstream_date when upstream recency could not be inferred from reachable payloads.