Registry of Core Datasets

FilesSizeFormatCreatedUpdatedLicenseSource
113.8 kBcsvover 6 years agoOpen Data Commons Public Domain Dedication and License v1.0

Registry of published datasets in the Core Datasets Project

Read more

Data Files

FileDescriptionSizeLast modifiedDownload
core-list
13.8 kBover 6 years ago
core-list

Data Previews

core-list

Schema

nametypedescription
namestringName of the dataset
github_urlstringThe location in GitHub
run_datestringLast run date
modifiedstringFrequency information (year-A, quarter-Q, month-M, day-D, no-N)
validated_metadatastringMetadata validation status
validated_datastringData validation status
publishedstringPublished location on DataHub
ok_on_datahubstringStatus on DataHub
validated_metadata_messagestringError messages if validation fails
validated_data_messagestringError messages if validation fails
auto_publishstringPublished by DataHub automatically

badge

Core data registry and tooling.

Registry

Registry is maintained as Tabular Data Package with list of datasets in core-list.csv.

To add a dataset add it to the core-list.csv - we recommend fork and pull.

Discussion of proposals for new datasets and for incorporation of prepared datasets takes place in the issues.

To propose a new dataset for inclusion, please create a new issue.

Core Dataset Tools

Installation

$ npm install

Usage

  • Environmental variables

DOMAIN - testing or production environment. For example: https://datahub.io TYPE - type of dataset. For example: examples or core

node index.js [COMMAND] [PATH]

# PATH - path to csv file

Clone datasets

To clone all core datasets run the following command:

node index.js clone [PATH]

It will clone all core datasets into following directory: data/${pkg_name}

Check datasets

To check all core datasets run the following command:

node index.js check [PATH]

It will validate metadata and data according to the latest spec.

Normalize datasets

To normalize all core datasets run the following command:

node index.js norm [PATH]

It will normalize all core datasets into following directory: data/${pkg_name}

Push datasets

To publish all core data packages run the following command:

node index.js push [PATH]

Running tests

We use Ava for our tests. For running tests use:

$ [sudo] npm test

To run tests in watch mode:

$ [sudo] npm run watch:test

© 2025 All rights reservedBuilt with DataHub Cloud