Logo

Registry of Core Datasets

FilesSizeFormatCreatedUpdatedLicenseSource
113.8 kBcsvOpen Data Commons Public Domain Dedication and License v1.0

Registry of published datasets in the Core Datasets Project

Read more

Data Files

FileDescriptionSizeLast modifiedDownload
core-list
13.8 kB
core-list

Data Previews

core-list

Schema

nametypedescription
namestringName of the dataset
github_urlstringThe location in GitHub
run_datestringLast run date
modifiedstringFrequency information (year-A, quarter-Q, month-M, day-D, no-N)
validated_metadatastringMetadata validation status
validated_datastringData validation status
publishedstringPublished location on DataHub
ok_on_datahubstringStatus on DataHub
validated_metadata_messagestringError messages if validation fails
validated_data_messagestringError messages if validation fails
auto_publishstringPublished by DataHub automatically

Core data registry and tooling.

Registry

Registry is maintained as Tabular Data Package with list of datasets in core-list.csv.

To add a dataset add it to the core-list.csv - we recommend fork and pull.

Discussion of proposals for new datasets and for incorporation of prepared datasets takes place in the issues.

To propose a new dataset for inclusion, please create a new issue.

Core Dataset Tools

Installation

$ npm install

Usage

  • Environmental variables

DOMAIN - testing or production environment. For example: https://datahub.io TYPE - type of dataset. For example: examples or core

node index.js [COMMAND] [PATH]

# PATH - path to csv file

Clone datasets

To clone all core datasets run the following command:

node index.js clone [PATH]

It will clone all core datasets into following directory: data/${pkg_name}

Check datasets

To check all core datasets run the following command:

node index.js check [PATH]

It will validate metadata and data according to the latest spec.

Normalize datasets

To normalize all core datasets run the following command:

node index.js norm [PATH]

It will normalize all core datasets into following directory: data/${pkg_name}

Push datasets

To publish all core data packages run the following command:

node index.js push [PATH]

Running tests

We use Ava for our tests. For running tests use:

$ [sudo] npm test

To run tests in watch mode:

$ [sudo] npm run watch:test

© 2025 All rights reservedBuilt with DataHub Cloud

Built with LogoDataHub Cloud