Published

Artificial Intelligence

The awesome section presents collections of high quality datasets organized by topic. Home page for awesome collections is located in the awesome-data repository on github and should be modified from...

Artificial intelligence has become one of the defining technological forces of our era. This collection brings together datasets tracking the progress of AI — from historical adoption curves and compute scaling to model capability benchmarks and economic impact indicators.

Datasets on DataHub

AI Models & Capabilities

  • Epoch AI — Notable ML Models: https://datahub.io/ai/epoch-data-on-ai-models. A curated dataset of notable machine learning models from 1950 to the present, tracking publication year, parameters, training compute (FLOPs), hardware, and organization.

  • Historical Adoption of Technology: https://datahub.io/ai/historical-adoption-of-technology. Long-run adoption curves for transformative technologies in the United States, including the internet, smartphones, and AI-related tools. Useful for contextualizing how quickly AI is being adopted relative to prior waves of technology.

Key Themes

Compute scaling — Training compute (measured in FLOPs) has roughly doubled every 6 months for frontier models since 2010, far outpacing Moore's Law. Epoch AI's dataset is the canonical source for tracking this trend.

Adoption curves — AI tools like ChatGPT reached 100 million users faster than any prior consumer technology. Historical adoption data provides the benchmark for comparison.

Concentration — A small number of organizations (Google, OpenAI, Meta, DeepMind, Anthropic) account for the majority of frontier model development. This is visible in the Epoch dataset's organization field.

External Resources