Registry of Core Datasets

•

Files	Size	Format	Created	Updated	License	Source
1	13.8 kB	csv		almost 7 years ago	Open Data Commons Public Domain Dedication and License v1.0

Registry of published datasets in the Core Datasets Project

File	Description	Size	Last modified	Download
core-list		13.8 kB	almost 7 years ago	core-list

Data Previews

core-list

Schema

name	type	description
name	string	Name of the dataset
github_url	string	The location in GitHub
run_date	string	Last run date
modified	string	Frequency information (year-A, quarter-Q, month-M, day-D, no-N)
validated_metadata	string	Metadata validation status
validated_data	string	Data validation status
published	string	Published location on DataHub
ok_on_datahub	string	Status on DataHub
validated_metadata_message	string	Error messages if validation fails
validated_data_message	string	Error messages if validation fails
auto_publish	string	Published by DataHub automatically

Core data registry and tooling.

Registry

Registry is maintained as Tabular Data Package with list of datasets in core-list.csv.

To add a dataset add it to the core-list.csv - we recommend fork and pull.

Discussion of proposals for new datasets and for incorporation of prepared datasets takes place in the issues.

To propose a new dataset for inclusion, please create a new issue.

Core Dataset Tools

Installation

$ npm install

Usage

Environmental variables

DOMAIN - testing or production environment. For example: https://datahub.io TYPE - type of dataset. For example: examples or core

node index.js [COMMAND] [PATH]

# PATH - path to csv file

Clone datasets

To clone all core datasets run the following command:

node index.js clone [PATH]

It will clone all core datasets into following directory: data/${pkg_name}

Check datasets

To check all core datasets run the following command:

node index.js check [PATH]

It will validate metadata and data according to the latest spec.

Normalize datasets

To normalize all core datasets run the following command:

node index.js norm [PATH]

It will normalize all core datasets into following directory: data/${pkg_name}

Push datasets

To publish all core data packages run the following command:

node index.js push [PATH]

Running tests

We use Ava for our tests. For running tests use:

$ [sudo] npm test

To run tests in watch mode:

$ [sudo] npm run watch:test

Data Files