Artificial intelligence has become one of the defining technological forces of our era. This collection brings together datasets tracking the progress of AI — from historical adoption curves and compute scaling to model capability benchmarks and economic impact indicators.
Datasets on DataHub
AI Models & Capabilities
-
Epoch AI — Notable ML Models: https://datahub.io/ai/epoch-data-on-ai-models. A curated dataset of notable machine learning models from 1950 to the present, tracking publication year, parameters, training compute (FLOPs), hardware, and organization.
-
Historical Adoption of Technology: https://datahub.io/ai/historical-adoption-of-technology. Long-run adoption curves for transformative technologies in the United States, including the internet, smartphones, and AI-related tools. Useful for contextualizing how quickly AI is being adopted relative to prior waves of technology.
Key Themes
Compute scaling — Training compute (measured in FLOPs) has roughly doubled every 6 months for frontier models since 2010, far outpacing Moore's Law. Epoch AI's dataset is the canonical source for tracking this trend.
Adoption curves — AI tools like ChatGPT reached 100 million users faster than any prior consumer technology. Historical adoption data provides the benchmark for comparison.
Concentration — A small number of organizations (Google, OpenAI, Meta, DeepMind, Anthropic) account for the majority of frontier model development. This is visible in the Epoch dataset's organization field.
External Resources
- Epoch AI Research — Quantitative research on AI timelines, compute trends, and model capabilities.
- AI Index Report (Stanford HAI) — Annual report covering AI progress across research, economy, education, and policy.
- Our World in Data — AI — Accessible charts on AI adoption, capabilities, and societal impact.
- Papers With Code — State-of-the-art benchmarks across ML tasks, updated continuously.
- Hugging Face Datasets — Large repository of ML training datasets, models, and benchmarks.