The datahub stores datapackages (datasets). Datapackage is consist of:
- the Data: file(s) with the data. Usually it is tabular (csv, xls) but could be any file type.
- the Metadata: descriptor file, named
datapackage.json, that contains all the information you need to find and to use the data itself
Detailed description of the datapackage format is here: http://datahub.io/docs/data-packages
Get files via
/r/ endpoint, if you know the name of the file you need in the dataset, e.g. for
If you don't know the filename, or there is a lot of files in the dataset, use our enumeration logic:
GET https://datahub.io/<owner>/<dataset_name>/r/0.csv GET https://datahub.io/<owner>/<dataset_name>/r/1.csv GET https://datahub.io/<owner>/<dataset_name>/r/2.csv ...
The datahub.io path logic is described here: getting-data#perma-urls-for-data
Get the descriptor
If you need to see the list of the files in the dataset, or other metadata, then get the descriptor (
Now you could parse the descriptor to get each resource (data) path:
descriptor = json.load('datapackage.json') for resource in descriptor.resources: print(resource.name, resource.path)
Then you can easily get all the files.
The easiest way to post data on the datahub is to use our CLI tool, instructions is here: publishing_data.
Here is the way to POST data from your application:
- Read the Documentation: http://docs.datahub.io/developers/api/
- Use our JS lib, that interacts with the datahub: https://github.com/datahq/datahub-client
- You'll need following module to work with DataHub API: https://github.com/datahq/datahub-client/blob/master/lib/utils/datahub.js
- Example of use: https://github.com/datahq/data-cli/blob/master/bin/data-push.js