Updated

DataPressr — AI Agent Instructions

Estimated primary energy consumption in the United States from 1635 to 2000. Covers 365 years of US energy history: from wood and water power in the colonial era, through the coal-dominated industrial revolution, to the rise of petroleum and natural gas in the 20th century. Pre-1949 data covers selected years only; annual data begins in 1949. All values in quadrillion Btu (quad Btu).

API Access

Access dataset files directly from scripts, code, or AI agents.

Browse dataset files
Dataset Files

Each file has a stable URL (r-link) that you can use directly in scripts, apps, or AI agents. These URLs are permanent and safe to hardcode.

/energy-and-commodities/us-primary-energy-consumption-historical/
https://datahub.io/energy-and-commodities/us-primary-energy-consumption-historical/_r/-/AGENTS.md
https://datahub.io/energy-and-commodities/us-primary-energy-consumption-historical/_r/-/README.md
https://datahub.io/energy-and-commodities/us-primary-energy-consumption-historical/_r/-/data/primary-energy-by-source.csv
https://datahub.io/energy-and-commodities/us-primary-energy-consumption-historical/_r/-/data/primary-energy-consumption.csv
https://datahub.io/energy-and-commodities/us-primary-energy-consumption-historical/_r/-/datapackage.json
https://datahub.io/energy-and-commodities/us-primary-energy-consumption-historical/_r/-/process.py
Key Files

Start with these files — they give you everything you need to understand and access the dataset.

datapackage.jsonmetadata & schema
https://datahub.io/energy-and-commodities/us-primary-energy-consumption-historical/_r/-/datapackage.json
README.mddocumentation
https://datahub.io/energy-and-commodities/us-primary-energy-consumption-historical/_r/-/README.md
Typical Usage
  1. 1. Fetch datapackage.json to inspect schema and resources
  2. 2. Download data resources listed in datapackage.json
  3. 3. Read README.md for full context

Data Views

Data Previews

US Primary Energy Consumption 1635–2000

Loading data...

Schema

nametypedescription
yearintegerYear of observation
total_quad_btunumberTotal primary energy consumption in quadrillion British thermal units (quad Btu). Includes all sources: coal, petroleum, natural gas, nuclear, hydroelectric, biomass, and other renewables.
notesstringData notes. 'selected year' indicates the pre-1949 series where only specific benchmark years are available.

US Primary Energy Consumption by Source 1850–1945

Loading data...

Schema

nametypedescription
yearintegerYear of observation (selected years, at 5-year intervals)
coal_quad_btunumberCoal consumption in quadrillion Btu
natural_gas_quad_btunumberNatural gas consumption in quadrillion Btu
petroleum_quad_btunumberPetroleum consumption in quadrillion Btu
hydro_quad_btunumberHydroelectric power in quadrillion Btu
biomass_quad_btunumberWood and biomass consumption in quadrillion Btu (pre-1949 series covers fuelwood only)
total_quad_btunumberTotal primary energy consumption in quadrillion Btu

Data Files

FileDescriptionSizeLast modifiedDownload
primary-energy-consumption
Total US primary energy consumption by year, 1635–2000. Pre-1949 data covers selected years at irregular intervals (decennial 1635–1845, quinquennial 1850–1945). Annual data from 1949. Gap exists for 1946–1948 (bridging period between historical and modern EIA series). Values in quadrillion Btu (1 quad = 10^15 Btu = 1.055 exajoules).1.79 kBabout 1 month ago
primary-energy-consumption
primary-energy-by-source
Breakdown of US primary energy consumption by fuel source for selected years 1850–1945. Shows the transition from wood and water power to coal, petroleum, and natural gas. Values in quadrillion Btu.900 Babout 1 month ago
primary-energy-by-source
FilesSizeFormatCreatedUpdatedLicenseSource
22.69 kBcsvabout 1 month agoOpen Data Commons Public Domain Dedication and License (PDDL)US Energy Information Administration: Estimated Primary Energy Consumption in the United States, Selected Years, 1635–1945 (Quadrillion Btu)

DataPressr — AI Agent Instructions

You are helping wrangle raw data finds into clean, publishable datasets on DataHub.

Concepts

Data hierarchy

  • Catalog — a collection of datasets. Maps to one GitHub repo + one DataHub publication. Example: "World Bank Open Data", "Our World in Data".
  • Dataset — a coherent data concept with defined schema and coverage. One directory, one datapackage.json. Example: "World GDP 1960–2024".
  • Data file — a concrete file artifact (csv, json, parquet…). Listed as a resource in datapackage.json.

Catalog-as-repo rule: if the source is a portal or collection containing many datasets, give it its own repo and DataHub publication — not a subfolder inside another dataset.

Dataset lifecycle

A dataset doesn't need to be complete to be published. Lifecycle stages:

StageDescription
captureJust a URL or note — intent to explore
stubTitle, description, source link. No files yet. Publishable.
archivedRaw files downloaded locally
structuredCleaned, normalised, schema documented
enrichedAnalysis, visualisations, derived data added
monitoredLiving source, versioned and updated over time

Set "status": "<stage>" in datapackage.json to track this.


Dataset structure

Every dataset is a directory:

<name>/
  datapackage.json   # metadata and resource list (required)
  data/              # data files go here
  .datahubignore     # gitignore-style exclusions for dh push
  AGENTS.md          # this file (copy into new datasets)

datapackage.json

Minimal valid example:

{
  "name": "world-gdp",
  "title": "World GDP",
  "description": "GDP by country from World Bank, 1960–2024",
  "status": "structured",
  "resources": [
    {
      "path": "data/gdp.csv",
      "name": "gdp",
      "title": "GDP by Country",
      "mediatype": "text/csv"
    }
  ]
}

Rules:

  • name must be URL-safe: lowercase, hyphens only
  • Every file in data/ that should be published must be in resources
  • status should reflect the lifecycle stage above
  • Use .datahubignore to exclude scratch files, large intermediaries, raw downloads

Adding charts (views)

Add a views array to datapackage.json to render charts on the dataset page:

{
  "views": [
    {
      "name": "gdp-over-time",
      "title": "GDP Over Time",
      "specType": "simple",
      "resources": ["gdp"],
      "spec": {
        "type": "line",
        "group": "year",
        "series": ["gdp_usd"]
      }
    }
  ]
}

Supported chart types: line, bar, lines-and-points. Only CSV and GeoJSON resources can be visualised. group is the x-axis field, series is the list of y-axis fields.


Workflow

Start a new dataset

Create the directory structure:

mkdir -p <name>/data
cd <name>

Create datapackage.json with at minimum name, title, description. Add "status": "stub" if no data files yet.

Copy this AGENTS.md into the new directory so future AI sessions have context.

Push to DataHub

dh push .

Requires env vars:

export DATAHUB_API_URL=https://datahub.io
export DATAHUB_API_TOKEN=<your-token>
export DATAHUB_PUBLICATION=<your-publication-slug>

dh is the DataHub CLI — install from datopian/datahub-next.

Delete a dataset

dh delete <name>

Claude Code skills

If using Claude Code, the following slash commands are available in this repo:

CommandWhat it does
/init <name>Scaffold a new dataset directory
/pushPush current directory to DataHub
/validateCheck datapackage.json for common issues