collections

Artificial intelligence has become one of the defining technological forces of our era. This collection brings together datasets tracking the progress of AI — from historical adoption curves and compute scaling to model capability benchmarks and economic impact indicators.

Datasets on DataHub

AI Models & Capabilities

Epoch AI — Notable ML Models: https://datahub.io/ai/epoch-data-on-ai-models. A curated dataset of notable machine learning models from 1950 to the present, tracking publication year, parameters, training compute (FLOPs), hardware, and organization.
Historical Adoption of Technology: https://datahub.io/ai/historical-adoption-of-technology. Long-run adoption curves for transformative technologies in the United States, including the internet, smartphones, and AI-related tools. Useful for contextualizing how quickly AI is being adopted relative to prior waves of technology.

Key Themes

Compute scaling — Training compute (measured in FLOPs) has roughly doubled every 6 months for frontier models since 2010, far outpacing Moore's Law. Epoch AI's dataset is the canonical source for tracking this trend.

Adoption curves — AI tools like ChatGPT reached 100 million users faster than any prior consumer technology. Historical adoption data provides the benchmark for comparison.

Concentration — A small number of organizations (Google, OpenAI, Meta, DeepMind, Anthropic) account for the majority of frontier model development. This is visible in the Epoch dataset's organization field.

External Resources

Epoch AI Research — Quantitative research on AI timelines, compute trends, and model capabilities.
AI Index Report (Stanford HAI) — Annual report covering AI progress across research, economy, education, and policy.
Our World in Data — AI — Accessible charts on AI adoption, capabilities, and societal impact.
Papers With Code — State-of-the-art benchmarks across ML tasks, updated continuously.
Hugging Face Datasets — Large repository of ML training datasets, models, and benchmarks.