Epoch Data on AI Models

2,064
0
Updated:
Files:4
Size:8.85 MB
Formats:csv
License:CC-BY-4.0

Comprehensive database of over 2800 AI/ML models tracking key factors driving machine learning progress, including parameters, training compute, training dataset size, publication date, organization, and more. Sourced from Epoch AI.

API Access

Access dataset files directly from scripts, code, or AI agents.

Browse dataset files
Dataset Files

Each file has a stable URL (r-link) that you can use directly in scripts, apps, or AI agents. These URLs are permanent and safe to hardcode.

/ai/epoch-data-on-ai-models/
https://datahub.io/ai/epoch-data-on-ai-models/_r/-/AGENTS.md
https://datahub.io/ai/epoch-data-on-ai-models/_r/-/README.md
https://datahub.io/ai/epoch-data-on-ai-models/_r/-/data/all_ai_models.csv
https://datahub.io/ai/epoch-data-on-ai-models/_r/-/data/frontier_ai_models.csv
https://datahub.io/ai/epoch-data-on-ai-models/_r/-/data/large_scale_ai_models.csv
https://datahub.io/ai/epoch-data-on-ai-models/_r/-/data/notable_ai_models.csv
https://datahub.io/ai/epoch-data-on-ai-models/_r/-/datapackage.json
Key Files

Start with these files — they give you everything you need to understand and access the dataset.

datapackage.jsonmetadata & schema
https://datahub.io/ai/epoch-data-on-ai-models/_r/-/datapackage.json
README.mddocumentation
https://datahub.io/ai/epoch-data-on-ai-models/_r/-/README.md
Typical Usage
  1. 1. Fetch datapackage.json to inspect schema and resources
  2. 2. Download data resources listed in datapackage.json
  3. 3. Read README.md for full context

Data Views

Data Files

Explore with AI

All AI Models

Download

Download CSV

About

All AI models in the Epoch database (~21,000 entries).
Last updated
19 March 2026
Total rows
...
Format
CSV
File size
5.72 MB

Notable AI Models

Download

Download CSV

About

Subset of notable AI models with richer metadata (~7,400 entries).
Last updated
19 March 2026
Total rows
...
Format
CSV
File size
1.85 MB

Large-Scale AI Models

Download

Download CSV

About

Large-scale AI models subset (~3,600 entries).
Last updated
19 March 2026
Total rows
...
Format
CSV
File size
902 kB

Frontier AI Models

Download

Download CSV

About

Frontier AI models subset — the most capable models at each point in time (~1,600 entries).
Last updated
19 March 2026
Total rows
...
Format
CSV
File size
371 kB

About this dataset

Dataset: epoch-data-on-ai-models

This is a Frictionless Data Package.

Concepts

Data hierarchy (from broad to specific):

  • Catalog = a collection of datasets (maps to a DataHub publication, one GitHub repo)
  • Dataset = a coherent data concept with a defined schema and coverage — this directory
  • Data file = a concrete file artifact (csv, json, parquet…) listed as a resource in datapackage.json

Dataset lifecycle — a dataset doesn't need to be complete on day one:

  • capture — just a URL or note, intent to explore
  • stub — minimal entry: title, description, source link, no files yet
  • archived — raw files downloaded locally
  • structured — cleaned, normalised, schema documented
  • enriched — analysis, visualisations, derived data added
  • monitored — living source, versioned and updated over time

Catalog-as-repo pattern: if the source is a portal or collection containing many datasets (e.g. a data.gov agency, an institutional archive), give it its own repo and DataHub publication — not a subfolder here.


Structure

epoch-data-on-ai-models/
  datapackage.json   # dataset metadata and resource list
  data/              # data files (csv, json, parquet, etc.)
  .datahubignore     # files to exclude when pushing (gitignore syntax)

datapackage.json

Keep resources in sync with what's in data/:

{
  "name": "epoch-data-on-ai-models",
  "title": "Human readable title",
  "description": "What this dataset is about",
  "resources": [
    {
      "path": "data/my-file.csv",
      "name": "my-file",
      "mediatype": "text/csv"
    }
  ]
}

Workflow

# Add data files to data/
# Edit datapackage.json — update resources to list them
data pack .   # validate
dh push .     # publish to DataHub

Key rules

  • Every file in data/ that you want published must be listed in resources
  • name in datapackage.json must be URL-safe (lowercase, hyphens)
  • Use .datahubignore to exclude scratch files, large intermediaries, etc.
  • It is fine to push a stub — set lifecycle stage in datapackage.json as "status": "stub" if incomplete