API Access

Access dataset files directly from scripts, code, or AI agents.

Browse dataset files

Dataset Files

Each file has a stable URL (r-link) that you can use directly in scripts, apps, or AI agents. These URLs are permanent and safe to hardcode.

/ai/epoch-data-on-ai-models/

├ AGENTS.md

https://datahub.io/ai/epoch-data-on-ai-models/_r/-/AGENTS.md

├ README.md

https://datahub.io/ai/epoch-data-on-ai-models/_r/-/README.md

├ data/all_ai_models.csv

https://datahub.io/ai/epoch-data-on-ai-models/_r/-/data/all_ai_models.csv

├ data/frontier_ai_models.csv

https://datahub.io/ai/epoch-data-on-ai-models/_r/-/data/frontier_ai_models.csv

├ data/large_scale_ai_models.csv

https://datahub.io/ai/epoch-data-on-ai-models/_r/-/data/large_scale_ai_models.csv

├ data/notable_ai_models.csv

https://datahub.io/ai/epoch-data-on-ai-models/_r/-/data/notable_ai_models.csv

└ datapackage.json

https://datahub.io/ai/epoch-data-on-ai-models/_r/-/datapackage.json

Key Files

Start with these files — they give you everything you need to understand and access the dataset.

datapackage.json— metadata & schema

https://datahub.io/ai/epoch-data-on-ai-models/_r/-/datapackage.json

README.md— documentation

https://datahub.io/ai/epoch-data-on-ai-models/_r/-/README.md

Typical Usage

1. Fetch datapackage.json to inspect schema and resources
2. Download data resources listed in datapackage.json
3. Read README.md for full context

Data Views

Data Files

Explore with AI

All AI Models

Field	Type	Format	Description
Model	string		Name of the AI model
Domain	string		Domain(s) the model operates in (e.g. Language, Vision)
Task	string		Task(s) the model is designed for
Organization	string		Organization(s) that developed the model
Authors	string		Authors of the model or associated paper
Publication date	date	%Y-%m-%d	Date the model was published or released
Reference	string		Citation reference for the model
Link	string		URL to model paper or announcement
Citations	number		Number of citations
Notability criteria	string		Criteria that make this model notable
Notability criteria notes	string		Additional notes on notability criteria
Parameters	number		Number of model parameters
Parameters notes	string		Notes on parameter count
Training compute (FLOP)	number		Total training compute in floating point operations
Training compute notes	string		Notes on training compute estimate
Training dataset	string		Name or description of the training dataset
Training dataset notes	string		Notes on the training dataset
Training dataset size (datapoints)	number		Number of datapoints in the training dataset
Dataset size notes	string		Notes on dataset size
Training time (hours)	number		Total training time in hours
Training time notes	string		Notes on training time estimate
Training hardware	string		Hardware used for training (e.g. A100, H100)
Approach	string		Modeling approach or architecture type
Confidence	string		Confidence level of the data entries
Abstract	string		Abstract of the associated paper
Epochs	number		Number of training epochs
Benchmark data	string		Benchmark evaluation data
Model accessibility	string		Accessibility of the model weights (e.g. open, closed)
Country (of organization)	string		Country where the developing organization is based
Base model	string		Base model this model was fine-tuned from, if any
Finetune compute (FLOP)	number		Compute used for fine-tuning in FLOP
Finetune compute notes	string		Notes on fine-tune compute estimate
Hardware quantity	number		Number of hardware units used for training
Hardware utilization (MFU)	number		Model FLOP utilization (MFU) of training hardware
Last modified	string		Timestamp when the record was last modified
Training cloud compute vendor	string		Cloud vendor used for training compute
Training data center	string		Data center used for training
Archived links	string		Archived URLs for the model or paper
Batch size	number		Training batch size
Batch size notes	string		Notes on batch size
Organization categorization	string		Category of the developing organization (e.g. Industry, Academia)
Foundation model	boolean		Whether this is a foundation model
Training compute lower bound	number		Lower bound estimate of training compute in FLOP
Training compute upper bound	number		Upper bound estimate of training compute in FLOP
Training chip-hours	number		Total chip-hours used for training
Training code accessibility	string		Accessibility of training code
Accessibility notes	string		Notes on accessibility of model or code
Organization categorization (from Organization)	string		Organization category derived from organization field
Possibly over 1e23 FLOP	boolean		Whether training compute may exceed 1e23 FLOP
Training compute cost (2023 USD)	number		Estimated training compute cost in 2023 US dollars
Utilization notes	string		Notes on hardware utilization
Numerical format	string		Numerical precision format used in training (e.g. FP16, BF16)
Frontier model	boolean		Whether this model was a frontier model at the time of release
Training power draw (W)	number		Power consumption during training in watts
Training compute estimation method	string		Method used to estimate training compute
Hugging Face developer id	string		Hugging Face developer or organization identifier
Post-training compute (FLOP)	number		Compute used for post-training (RLHF, fine-tuning, etc.) in FLOP
Post-training compute notes	string		Notes on post-training compute estimate
Hardware utilization (HFU)	number		Hardware FLOP utilization (HFU) during training

Download

Download CSV

About

All AI models in the Epoch database (~21,000 entries).
Last updated: 19 March 2026
Total rows: ...
Format: CSV
File size: 5.72 MB
Source: Epoch AI — Notable AI Models
License: Creative Commons Attribution 4.0

Notable AI Models

Field	Type	Format	Description
Model	string		Name of the AI model
Organization	string		Organization(s) that developed the model
Publication date	date	%Y-%m-%d	Date the model was published or released
Domain	string		Domain(s) the model operates in (e.g. Language, Vision)
Task	string		Task(s) the model is designed for
Parameters	number		Number of model parameters
Parameters notes	string		Notes on parameter count
Training compute (FLOP)	number		Total training compute in floating point operations
Training compute notes	string		Notes on training compute estimate
Training dataset	string		Name or description of the training dataset
Training dataset size (datapoints)	number		Number of datapoints in the training dataset
Dataset size notes	string		Notes on dataset size
Confidence	string		Confidence level of the data entries
Link	string		URL to model paper or announcement
Reference	string		Citation reference for the model
Citations	number		Number of citations
Authors	string		Authors of the model or associated paper
Abstract	string		Abstract of the associated paper
Organization categorization	string		Category of the developing organization (e.g. Industry, Academia)
Country (of organization)	string		Country where the developing organization is based
Notability criteria	string		Criteria that make this model notable
Notability criteria notes	string		Additional notes on notability criteria
Epochs	number		Number of training epochs
Training time (hours)	number		Total training time in hours
Training time notes	string		Notes on training time estimate
Training hardware	string		Hardware used for training (e.g. A100, H100)
Hardware quantity	number		Number of hardware units used for training
Hardware utilization (MFU)	number		Model FLOP utilization (MFU) of training hardware
Training compute cost (2023 USD)	number		Estimated training compute cost in 2023 US dollars
Compute cost notes	string		Notes on compute cost estimate
Training power draw (W)	number		Power consumption during training in watts
Base model	string		Base model this model was fine-tuned from, if any
Finetune compute (FLOP)	number		Compute used for fine-tuning in FLOP
Finetune compute notes	string		Notes on fine-tune compute estimate
Batch size	number		Training batch size
Batch size notes	string		Notes on batch size
Model accessibility	string		Accessibility of the model weights (e.g. open, closed)
Training code accessibility	string		Accessibility of training code
Inference code accessibility	string		Accessibility of inference code
Accessibility notes	string		Notes on accessibility of model or code
Numerical format	string		Numerical precision format used in training (e.g. FP16, BF16)
Frontier model	boolean		Whether this model was a frontier model at the time of release
Hardware acquisition cost	number		Cost of acquiring the training hardware in USD
Hardware utilization (HFU)	number		Hardware FLOP utilization (HFU) during training
Training compute cost (cloud)	number		Estimated training compute cost using cloud pricing in USD
Training compute cost (upfront)	number		Estimated training compute cost using upfront hardware pricing in USD

Download

Download CSV

About

Subset of notable AI models with richer metadata (~7,400 entries).
Last updated: 19 March 2026
Total rows: ...
Format: CSV
File size: 1.85 MB
Source: Epoch AI — Notable AI Models
License: Creative Commons Attribution 4.0

Large-Scale AI Models

Field	Type	Format	Description
Model	string		Name of the AI model
Domain	string		Domain(s) the model operates in (e.g. Language, Vision)
Task	string		Task(s) the model is designed for
Authors	string		Authors of the model or associated paper
Model accessibility	string		Accessibility of the model weights (e.g. open, closed)
Link	string		URL to model paper or announcement
Citations	number		Number of citations
Reference	string		Citation reference for the model
Publication date	date	%Y-%m-%d	Date the model was published or released
Organization	string		Organization(s) that developed the model
Parameters	number		Number of model parameters
Parameters notes	string		Notes on parameter count
Training compute (FLOP)	number		Total training compute in floating point operations
Training compute notes	string		Notes on training compute estimate
Training dataset	string		Name or description of the training dataset
Training dataset notes	string		Notes on the training dataset
Training dataset size (datapoints)	number		Number of datapoints in the training dataset
Dataset size notes	string		Notes on dataset size
Training time (hours)	number		Total training time in hours
Training time notes	string		Notes on training time estimate
Training hardware	string		Hardware used for training (e.g. A100, H100)
Confidence	string		Confidence level of the data entries
Abstract	string		Abstract of the associated paper
Country (of organization)	string		Country where the developing organization is based
Base model	string		Base model this model was fine-tuned from, if any
Finetune compute (FLOP)	number		Compute used for fine-tuning in FLOP
Finetune compute notes	string		Notes on fine-tune compute estimate
Hardware quantity	number		Number of hardware units used for training
Hardware utilization (MFU)	number		Model FLOP utilization (MFU) of training hardware
Training code accessibility	string		Accessibility of training code
Accessibility notes	string		Notes on accessibility of model or code
Organization categorization (from Organization)	string		Organization category derived from organization field
Hardware utilization (HFU)	number		Hardware FLOP utilization (HFU) during training
Training compute cost (cloud)	number		Estimated training compute cost using cloud pricing in USD
Training compute cost (upfront)	number		Estimated training compute cost using upfront hardware pricing in USD

Download

Download CSV

About

Large-scale AI models subset (~3,600 entries).
Last updated: 19 March 2026
Total rows: ...
Format: CSV
File size: 902 kB
Source: Epoch AI — Notable AI Models
License: Creative Commons Attribution 4.0

Frontier AI Models

Field	Type	Format	Description
Model	string		Name of the AI model
Domain	string		Domain(s) the model operates in (e.g. Language, Vision)
Task	string		Task(s) the model is designed for
Authors	string		Authors of the model or associated paper
Notability criteria	string		Criteria that make this model notable
Notability criteria notes	string		Additional notes on notability criteria
Model accessibility	string		Accessibility of the model weights (e.g. open, closed)
Link	string		URL to model paper or announcement
Citations	number		Number of citations
Reference	string		Citation reference for the model
Publication date	date	%Y-%m-%d	Date the model was published or released
Organization	string		Organization(s) that developed the model
Parameters	number		Number of model parameters
Parameters notes	string		Notes on parameter count
Training compute (FLOP)	number		Total training compute in floating point operations
Training compute notes	string		Notes on training compute estimate
Training dataset	string		Name or description of the training dataset
Training dataset notes	string		Notes on the training dataset
Training dataset size (datapoints)	number		Number of datapoints in the training dataset
Dataset size notes	string		Notes on dataset size
Epochs	number		Number of training epochs
Inference compute (FLOP)	number		Compute per inference pass in FLOP
Inference compute notes	string		Notes on inference compute estimate
Training time (hours)	number		Total training time in hours
Training time notes	string		Notes on training time estimate
Training hardware	string		Hardware used for training (e.g. A100, H100)
Approach	string		Modeling approach or architecture type
Compute cost notes	string		Notes on compute cost estimate
Compute sponsor categorization	string		Category of the compute sponsor
Confidence	string		Confidence level of the data entries
Abstract	string		Abstract of the associated paper
Last modified	string		Timestamp when the record was last modified
Created By	string		Person who created this record
Benchmark data	string		Benchmark evaluation data
Exclude	boolean		Whether this model is excluded from certain analyses
Country (of organization)	string		Country where the developing organization is based
Base model	string		Base model this model was fine-tuned from, if any
Finetune compute (FLOP)	number		Compute used for fine-tuning in FLOP
Finetune compute notes	string		Notes on fine-tune compute estimate
Hardware quantity	number		Number of hardware units used for training
Hardware utilization (MFU)	number		Model FLOP utilization (MFU) of training hardware
Training cost trends	string		Trend information for training costs
Training cloud compute vendor	string		Cloud vendor used for training compute
Training data center	string		Data center used for training
Archived links	string		Archived URLs for the model or paper
Batch size	number		Training batch size
Batch size notes	string		Notes on batch size
Organization categorization	string		Category of the developing organization (e.g. Industry, Academia)
Foundation model	boolean		Whether this is a foundation model
Training compute lower bound	number		Lower bound estimate of training compute in FLOP
Training compute upper bound	number		Upper bound estimate of training compute in FLOP
Training chip-hours	number		Total chip-hours used for training
Training code accessibility	string		Accessibility of training code
Accessibility notes	string		Notes on accessibility of model or code
Organization categorization (from Organization)	string		Organization category derived from organization field
Possibly over 1e23 FLOP	boolean		Whether training compute may exceed 1e23 FLOP
Training compute cost (2023 USD)	number		Estimated training compute cost in 2023 US dollars
Training dataset size	number		Size of the training dataset (alternative field)
Sparsity	number		Model sparsity ratio
Utilization notes	string		Notes on hardware utilization
Estimated over 1e25 FLOP	boolean		Whether training compute is estimated to exceed 1e25 FLOP
Power per GPU	number		Power draw per GPU unit in watts
Cluster total TDP	number		Total thermal design power of the training cluster in watts
Base model compute	number		Training compute of the base model in FLOP
Total compute - (base + finetune)	number		Total compute including base model and fine-tuning in FLOP
API prices	string		API pricing information for the model
Created	string		Timestamp when the record was created
Inference code accessibility	string		Accessibility of inference code
Numerical format	string		Numerical precision format used in training (e.g. FP16, BF16)
Model versions	string		Available versions of the model
Frontier model	boolean		Whether this model was a frontier model at the time of release
Training power draw (W)	number		Power consumption during training in watts
Benchmark evals	string		Benchmark evaluation results
FLOP/$	number		Training compute efficiency in FLOP per dollar
Hardware release date	date	any	Release date of the training hardware
Hardware age	number		Age of the training hardware in years at time of training
Hardware FP32	number		Hardware FP32 FLOP/s throughput
Hardware TF32	number		Hardware TF32 FLOP/s throughput
Hardware count	number		Number of hardware units in the training cluster
Hardware TF16	number		Hardware TF16 FLOP/s throughput
Hardware FP16	number		Hardware FP16 FLOP/s throughput
Assumed precision	string		Assumed numerical precision for compute estimates
Assumed hardware FLOP/s	number		Assumed hardware throughput in FLOP/s used for compute estimates
Hardware type	string		Type of hardware used (e.g. GPU, TPU)
Training compute estimation method	string		Method used to estimate training compute
Biological model safeguards	string		Safeguards related to biological model risks
BenchmarkHub-v1	string		BenchmarkHub v1 evaluation results
Hugging Face developer id	string		Hugging Face developer or organization identifier
Post-training compute (FLOP)	number		Compute used for post-training (RLHF, fine-tuning, etc.) in FLOP
Post-training compute notes	string		Notes on post-training compute estimate
Hardware maker	string		Manufacturer of the training hardware
benchmarks/models	string		Benchmark to model mapping data
Maybe over 1e25 FLOP	boolean		Whether training compute may exceed 1e25 FLOP
Updated dataset size	number		Updated or revised training dataset size
WT103 ppl	number		WikiText-103 perplexity score
WT2 ppl	number		WikiText-2 perplexity score
PTB ppl	number		Penn Treebank perplexity score
Distillation or synthetic data	string		Whether model was trained on distillation or synthetic data
Distillation or synthetic data compute	number		Compute used to generate distillation or synthetic training data in FLOP
Distillation or synthetic data compute notes	string		Notes on distillation or synthetic data compute
Knowledge cutoff	string		Training data knowledge cutoff date
Context window	number		Maximum context window size in tokens
Hardware utilization (HFU)	number		Hardware FLOP utilization (HFU) during training
Training compute cost (cloud)	number		Estimated training compute cost using cloud pricing in USD
Training compute cost (upfront)	number		Estimated training compute cost using upfront hardware pricing in USD

Download

Download CSV

About

Frontier AI models subset — the most capable models at each point in time (~1,600 entries).
Last updated: 19 March 2026
Total rows: ...
Format: CSV
File size: 371 kB
Source: Epoch AI — Notable AI Models
License: Creative Commons Attribution 4.0

About this dataset

Dataset: epoch-data-on-ai-models

This is a Frictionless Data Package.

Concepts

Data hierarchy (from broad to specific):

Catalog = a collection of datasets (maps to a DataHub publication, one GitHub repo)
Dataset = a coherent data concept with a defined schema and coverage — this directory
Data file = a concrete file artifact (csv, json, parquet…) listed as a resource in datapackage.json

Dataset lifecycle — a dataset doesn't need to be complete on day one:

capture — just a URL or note, intent to explore
stub — minimal entry: title, description, source link, no files yet
archived — raw files downloaded locally
structured — cleaned, normalised, schema documented
enriched — analysis, visualisations, derived data added
monitored — living source, versioned and updated over time

Catalog-as-repo pattern: if the source is a portal or collection containing many datasets (e.g. a data.gov agency, an institutional archive), give it its own repo and DataHub publication — not a subfolder here.

Structure

epoch-data-on-ai-models/
  datapackage.json   # dataset metadata and resource list
  data/              # data files (csv, json, parquet, etc.)
  .datahubignore     # files to exclude when pushing (gitignore syntax)

datapackage.json

Keep resources in sync with what's in data/:

{
  "name": "epoch-data-on-ai-models",
  "title": "Human readable title",
  "description": "What this dataset is about",
  "resources": [
    {
      "path": "data/my-file.csv",
      "name": "my-file",
      "mediatype": "text/csv"
    }
  ]
}

Workflow

# Add data files to data/
# Edit datapackage.json — update resources to list them
data pack .   # validate
dh push .     # publish to DataHub

Key rules

Every file in data/ that you want published must be listed in resources
name in datapackage.json must be URL-safe (lowercase, hyphens)
Use .datahubignore to exclude scratch files, large intermediaries, etc.
It is fine to push a stub — set lifecycle stage in datapackage.json as "status": "stub" if incomplete