Published

Registry of Core Datasets

34
20K

Registry of published datasets in the Core Datasets Project

API Access

Access dataset files directly from scripts, code, or AI agents.

Browse dataset files
Dataset Files

Each file has a stable URL (r-link) that you can use directly in scripts, apps, or AI agents. These URLs are permanent and safe to hardcode.

/core/core-datasets/
https://datahub.io/core/core-datasets/_r/-/.env.template
https://datahub.io/core/core-datasets/_r/-/.gitignore
https://datahub.io/core/core-datasets/_r/-/README.md
https://datahub.io/core/core-datasets/_r/-/core-list-testing.csv
https://datahub.io/core/core-datasets/_r/-/core-list.csv
https://datahub.io/core/core-datasets/_r/-/datapackage.json
https://datahub.io/core/core-datasets/_r/-/index.js
https://datahub.io/core/core-datasets/_r/-/package.json
https://datahub.io/core/core-datasets/_r/-/test/fixtures/finance-vix/README.md
https://datahub.io/core/core-datasets/_r/-/test/fixtures/finance-vix/data/vix-daily.csv
https://datahub.io/core/core-datasets/_r/-/test/fixtures/finance-vix/datapackage.json
https://datahub.io/core/core-datasets/_r/-/test/fixtures/invalid-dp/datapackage.json
https://datahub.io/core/core-datasets/_r/-/test/index.test.js
https://datahub.io/core/core-datasets/_r/-/test/status-test.csv
Key Files

Start with these files — they give you everything you need to understand and access the dataset.

datapackage.jsonmetadata & schema
https://datahub.io/core/core-datasets/_r/-/datapackage.json
README.mddocumentation
https://datahub.io/core/core-datasets/_r/-/README.md
Typical Usage
  1. 1. Fetch datapackage.json to inspect schema and resources
  2. 2. Download data resources listed in datapackage.json
  3. 3. Read README.md for full context

Data Previews

core-list

Loading data...

Schema

nametypedescription
namestringName of the dataset
github_urlstringThe location in GitHub
run_datestringLast run date
modifiedstringFrequency information (year-A, quarter-Q, month-M, day-D, no-N)
validated_metadatastringMetadata validation status
validated_datastringData validation status
publishedstringPublished location on DataHub
ok_on_datahubstringStatus on DataHub
validated_metadata_messagestringError messages if validation fails
validated_data_messagestringError messages if validation fails
auto_publishstringPublished by DataHub automatically

Data Files

FileDescriptionSizeLast modifiedDownload
core-list
13.8 kB26 days ago
core-list
FilesSizeFormatCreatedUpdatedLicenseSource
113.8 kBover 1 year agoOpen Data Commons Public Domain Dedication and License v1.0

badge

Core data registry and tooling.

Registry

Registry is maintained as Tabular Data Package with list of datasets in core-list.csv.

To add a dataset add it to the core-list.csv - we recommend fork and pull.

Discussion of proposals for new datasets and for incorporation of prepared datasets takes place in the issues.

To propose a new dataset for inclusion, please create a new issue.

Core Dataset Tools

Installation

$ npm install

Usage

  • Environmental variables

DOMAIN - testing or production environment. For example: https://datahub.io TYPE - type of dataset. For example: examples or core

node index.js [COMMAND] [PATH]

# PATH - path to csv file

Clone datasets

To clone all core datasets run the following command:

node index.js clone [PATH]

It will clone all core datasets into following directory: data/${pkg_name}

Check datasets

To check all core datasets run the following command:

node index.js check [PATH]

It will validate metadata and data according to the latest spec.

Normalize datasets

To normalize all core datasets run the following command:

node index.js norm [PATH]

It will normalize all core datasets into following directory: data/${pkg_name}

Push datasets

To publish all core data packages run the following command:

node index.js push [PATH]

Running tests

We use Ava for our tests. For running tests use:

$ [sudo] npm test

To run tests in watch mode:

$ [sudo] npm run watch:test