Now you can request additional data and/or customized columns!

Try It Now!

Registry of Core Datasets

core

Files Size Format Created Updated License Source
2 94kB csv zip 1 year ago 1 year ago Open Data Commons Public Domain Dedication and License v1.0
Core data registry and tooling. Registry Registry is maintained as Tabular Data Package with list of datasets in core-list.csv. [tdp]: http://frictionlessdata.io/guides/tabular-data-package/ To add a dataset add it to the core-list.csv - we recommend fork and pull. Discussion of proposals for read more
Download Developers

Data Files

Download files in this dataset

File Description Size Last changed Download
core-list 12kB csv (12kB) , json (27kB)
registry_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 8kB zip (8kB)

core-list  

Signup to Premium Service for additional or customised data - Get Started

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
name 1 string Name of the dataset
github_url 2 string The location in GitHub
run_date 3 string Last run date
modified 4 string Frequency information (year-A, quarter-Q, month-M, day-D, no-N)
validated_metadata 5 string Metadata validation status
validated_data 6 string Data validation status
published 7 string Published location on DataHub
ok_on_datahub 8 string Status on DataHub
validated_metadata_message 9 string Error messages if validation fails
validated_data_message 10 string Error messages if validation fails
auto_publish 11 string Published by DataHub automatically

Integrate this dataset into your favourite tool

Use our data-cli tool designed for data wranglers:

data get https://datahub.io/core/registry
data info core/registry
tree core/registry
# Get a list of dataset's resources
curl -L -s https://datahub.io/core/registry/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/core/registry/r/0.csv

curl -L https://datahub.io/core/registry/r/1.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")

json_file <- 'https://datahub.io/core/registry/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    print(data)
  }
}

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/core/registry/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/core/registry/datapackage.json')

# print list of all resources:
print(package.resource_names)

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        print(resource.read())

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/core/registry/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
    console.log(dataset.resources[id]._descriptor.name)
  }
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data
      stream.pipe(process.stdout)
    }
  }
})()

Read me

Core data registry and tooling.

Registry

Registry is maintained as Tabular Data Package with list of datasets in core-list.csv.

To add a dataset add it to the core-list.csv - we recommend fork and pull.

Discussion of proposals for new datasets and for incorporation of prepared datasets takes place in the issues.

To propose a new dataset for inclusion, please create a new issue.

Core Dataset Tools

Installation

$ npm install

Usage

  • Environmental variables

DOMAIN - testing or production environment. For example: https://datahub.io TYPE - type of dataset. For example: examples or core

node index.js [COMMAND] [PATH]

# PATH - path to csv file

Clone datasets

To clone all core datasets run the following command:

node index.js clone [PATH]

It will clone all core datasets into following directory: data/${pkg_name}

Check datasets

To check all core datasets run the following command:

node index.js check [PATH]

It will validate metadata and data according to the latest spec.

Normalize datasets

To normalize all core datasets run the following command:

node index.js norm [PATH]

It will normalize all core datasets into following directory: data/${pkg_name}

Push datasets

To publish all core data packages run the following command:

node index.js push [PATH]

Running tests

We use Ava for our tests. For running tests use:

$ [sudo] npm test

To run tests in watch mode:

$ [sudo] npm run watch:test
Datapackage.json

Request Customized Data


Notifications of data updates and schema changes

Warranty / guaranteed updates

Workflow integration (e.g. Python packages, NPM packages)

Customized data (e.g. you need different or additional data)

Or suggest your own feature from the link below