GitHunt
LU

luccasmmg/freshness-report

Dataset Freshness Check Runbook and Results

This workspace was used to run agentic freshness checks across dataset repositories under datasets/.

<FlatUiTable
data={{
url: 'https://raw.githubusercontent.com/luccasmmg/freshness-report/master/freshness-report.csv'
}}
/>

What Was Done

  • Ran an agentic workflow for each one of the repos
  • For each repo, the checker:
    • reads repo metadata (README, datapackage, scripts),
    • extracts candidate upstream source URLs,
    • probes/fetches upstream endpoints,
    • computes latest local date from local data files,
    • compares local vs inferred upstream recency,
    • appends a row to freshness-report.csv.

Output Files

  • Consolidated report: freshness-report.csv

CSV Schema

freshness-report.csv follows this structure:

repo_name,readme_location,datapackage_location,scripts_location,description,latest_local_date,latest_upstream_date,is_stale,staleness_reason,status

Notes and Caveats

  • Checks are source-driven (not based on last git update date).
  • Some datasets are flagged stale because upstream endpoints are inaccessible (401/403/404, redirects, or network timeouts).
  • Some rows have empty latest_upstream_date when upstream recency could not be inferred from reachable payloads.

Languages

JavaScript100.0%

Contributors

Created March 3, 2026
Updated March 3, 2026