If you’re not familiar with
datapackage.json, please, read this article - https://datahub.io/docs/data-packages.
Below is how our project looks like initially:
README.md sample.csv sample.json
We will use
data init command to create a
datapackage.json file for this project below.
data init command runs in non-interactive mode. No arguments and options are required, it will scan current working directory and all nested directories for the available files:
$ data init
> This process initializes a new datapackage.json file.
> Once there is a datapackage.json file, you can still run ‘data init’ to update/extend it.
> Press ^C at any time to quit.
> Detected special file: README.md
> sample.csv is just added to resources
> sample.json is just added to resources
> Default “ODC-PDDL” license is added. If you would like to add a different license, run ‘data init -i’ or edit ‘datapackage.json’ manually.
> 💾 Descriptor is saved in “datapackage.json”
and now the project contains
README.md datapackage.json sample.csv sample.json
If you take a look at
datapackage.json, you’d mention that:
nameproperty and generates
resourceslist with schema for tabular data
README.mdand uses its content in
descriptionproperty is the first 100 characters of the readme
If you need more control, e.g., you want to add only certain files, scan certain directories and add a different license, you can use
init command in interactive mode:
$ data init -i
You can now deploy your dataset to DataHub:
$ data push
Want to learn more? Visit our docs page - https://datahub.io/docs