CKAN Client Guide
CKAN Client Guide
Guide to interacting with CKAN for power users such as data scientists, data engineers and data wranglers.
This guide is about adding and managing data in CKAN programmatically and it assumes:
- You are familiar with key concepts like metadata, data, etc.
- You are working programmatically with a programming language such as Python, JavaScript or R (coming soon).
Frictionless Formats
Clients use Frictionless formats by default for describing dataset and resource objects passed to client methods. Internally, we then use the a CKAN <=> Frictionless Mapper (both in JavaScript and in Python) to convert objects to CKAN formats before calling the API. Thus, you can use Frictionless Formats by default with the client.
As CKAN moves to Frictionless to default this will gradually become unnecessary.
Quick start
Most of this guide has Python programming language in mind, including its convention regading using snake case for instances and methods names.
If needed, you can adapt the instructions to JavaScript and R (coming soon) by using camel case instead — for example, if in the Python code we have client.push_blob(…)
, in JavaScript it would be client.pushBlob(…)
.
Prerequisites
Install the client for your language of choice:
- Python: https://github.com/datopian/ckan-client-py#install
- JavaScript: https://github.com/datopian/ckan-client-js#install
- R: coming soon
Create a client
Python
from ckanclient import Client
api_key = '771a05ad-af90-4a70-beea-cbb050059e14'
api_url = 'http://localhost:5000'
organization = 'datopian'
dataset = 'dailyprices'
lfs_url = 'http://localhost:9419'
client = Client(api_url, organization, dataset, lfs_url)
JavaScript
const { Client } = require('ckanClient')
apiKey = '771a05ad-af90-4a70-beea-cbb050059e14'
apiUrl = 'http://localhost:5000'
organization = 'datopian'
dataset = 'dailyprices'
const client = Client(apiKey, organization, dataset, apiUrl)
Upload a resource
That is to say, upload a file, implicitly creating a new dataset.
Python
from frictionless import describe
resource = describe('my-data.csv')
client.push_blob(resource)
Create a new empty Dataset with metadata
Python
client.create('my-data')
client.push(resource)
Adding a resource to an existing Dataset
Not implemented yet.
client.create('my-data')
client.push_resource(resource)
Edit a Dataset's metadata
Not implemented yet.
dataset = client.retrieve('sample-dataset')
client.update_metadata(
dataset,
metadata: {'maintainer_email': '[email protected]'}
)
For details of metadata see the metadata reference below.
API - Porcelain
Client.create
Expects as a single argument: a string, or a dict (in Python), or an object (in JavaScript). This argument is either a valid dataset name or dictionary with metadata for the dataset in Frictionless format.
Client.push
Expects a single argument: a dict (in Python) or an object (in JavaScript) with a dataset metadata in Frictionless format.
Client.retrieve
Expects a single argument: a string with a dataset name or uniquer ID. Returns a Frictionless resource as a dict (in Python) or as an Promisse .<object> (in JavaScript).
Client.push_blob
Expects a single argument: a dict (in Python) or an object (in JavaScript) with a Frictionless resource.
API - Plumbing
Client.action
This method bridges access to the CKAN API action endpoint.
In Python
Arguments:
Name | Type | Default | Description |
---|---|---|---|
name | str | (required) | The action name, for example, site_read , package_show … |
payload | dict | (required) | The payload being sent to CKAN. When a payload is provided to a GET request, it will be converted to URL parameters and each key will be converted to snake case. |
http_get | bool | False | Optional, if True will make GET request, otherwise POST . |
transform_payload | function | None | Function to mutate the payload before making the request (useful to convert to and from CKAN and Frictionless formats). |
transform_response | function | None | function to mutate the response data before returning it (useful to convert to and from CKAN and Frictionless formats). |
The CKAN API uses the CKAN dataset and resource formats (rather than Frictionless formats).
In other words, to stick to Frictionless formats, you can pass
frictionless_ckan_mapper.frictionless_to_ckan
astransform_payload
, andfrictionless_ckan_mapper.ckan_to_frictionless
astransform_response
.
In JavaScript
Arguments:
Name | Type | Default | Description |
---|---|---|---|
actionName | string | (required) | The action name, for example, site_read , package_show … |
payload | object | (required) | The payload being sent to CKAN. When a payload is provided to a GET request, it will be converted to URL parameters and each key will be converted to snake case. |
useHttpGet | object | false | Optional, if True will make GET request, otherwise POST . |
The JavaScript implementation uses the CKAN dataset and resource formats (rather than Frictionless formats).
In other words, to stick to Frictionless formats, you need to convert from Frictionless to CKAN before calling
action
, and from CKAN to Frictionless after callingaction
.
Metadata reference
Your site may have custom metadata that differs from the example set below.
Profile
(string
) Defaults to data-resource.
The profile of this descriptor.
Every Package and Resource descriptor has a profile. The default profile, if none is declared, is data-package
for Package and data-resource
for Resource.
Examples
-
{"profile":"tabular-data-package"}
-
{"profile":"http://example.com/my-profiles-json-schema.json"}
Name
(string
)
An identifier string. Lower case characters with .
, _
, -
and /
are allowed.
This is ideally a url-usable and human-readable name. Name SHOULD
be invariant, meaning it SHOULD NOT
change when its parent descriptor is updated.
Example
{"name":"my-nice-name"}
Path
A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
The dereferenced value of each referenced data source in path
MUST
be commensurate with a native, dereferenced representation of the data the resource describes. For example, in a Tabular Data Resource, this means that the dereferenced value of path
MUST
be an array.
Validation
It must satisfy one of these conditions
Path
(string
)
A fully qualified URL, or a POSIX file path..
Implementations need to negotiate the type of path provided, and dereference the data accordingly.
Examples
-
{"path":"file.csv"}
-
{"path":"http://example.com/file.csv"}
(array
)
Examples
-
["file.csv"]
-
["http://example.com/file.csv"]
Examples
-
{"path":["file.csv","file2.csv"]}
-
{"path":["http://example.com/file.csv","http://example.com/file2.csv"]}
-
{"path":"http://example.com/file.csv"}
Data
Inline data for this resource.
Schema
(object
)
A schema for this resource.
Title
(string
)
A human-readable title.
Example
{"title":"My Package Title"}
Description
(string
)
A text description. Markdown is encouraged.
Example
{"description":"# My Package description\nAll about my package."}
Home Page
(string
)
The home on the web that is related to this data package.
Example
{"homepage":"http://example.com/"}
Sources
(array
)
The raw sources for this resource.
Example
{"sources":[{"title":"World Bank and OECD","path":"http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"}]}
Licenses
(array
)
The license(s) under which the resource is published.
This property is not legally binding and does not guarantee that the package is licensed under the terms defined herein.
Example
{"licenses":[{"name":"odc-pddl-1.0","path":"http://opendatacommons.org/licenses/pddl/","title":"Open Data Commons Public Domain Dedication and License v1.0"}]}
Format
(string
)
The file format of this resource.
csv
, xls
, json
are examples of common formats.
Example
{"format":"xls"}
Media Type
(string
)
The media type of this resource. Can be any valid media type listed with IANA.
Example
{"mediatype":"text/csv"}
Encoding
(string
) Defaults to utf-8.
The file encoding of this resource.
Example
{"encoding":"utf-8"}
Bytes
(integer
)
The size of this resource in bytes.
Example
{"bytes":2082}
Hash
(string
)
The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
Examples
-
{"hash":"d25c9c77f588f5dc32059d2da1136c02"}
-
{"hash":"SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"}
Generating templates
You can use jsv
to generate a template script in Python, JavaScript, and R.
To install it:
$ npm install -g git+https://github.com/datopian/jsv.git
Python
$ jsv data-resource.json --output py
Output
dataset_metadata = {
"profile": "data-resource", # The profile of this descriptor.
# [example] "profile": "tabular-data-package"
# [example] "profile": "http://example.com/my-profiles-json-schema.json"
"name": "my-nice-name", # An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
"path": ["file.csv","file2.csv"], # A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
# [example] "path": ["http://example.com/file.csv","http://example.com/file2.csv"]
# [example] "path": "http://example.com/file.csv"
"data": None, # Inline data for this resource.
"schema": None, # A schema for this resource.
"title": "My Package Title", # A human-readable title.
"description": "# My Package description\nAll about my package.", # A text description. Markdown is encouraged.
"homepage": "http://example.com/", # The home on the web that is related to this data package.
"sources": [{"title":"World Bank and OECD","path":"http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"}], # The raw sources for this resource.
"licenses": [{"name":"odc-pddl-1.0","path":"http://opendatacommons.org/licenses/pddl/","title":"Open Data Commons Public Domain Dedication and License v1.0"}], # The license(s) under which the resource is published.
"format": "xls", # The file format of this resource.
"mediatype": "text/csv", # The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
"encoding": "utf-8", # The file encoding of this resource.
# [example] "encoding": "utf-8"
"bytes": 2082, # The size of this resource in bytes.
"hash": "d25c9c77f588f5dc32059d2da1136c02", # The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
# [example] "hash": "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"
}
JavaScript
$ jsv data-resource.json --output js
Output
const datasetMetadata = {
// The profile of this descriptor.
profile: "data-resource",
// [example] profile: "tabular-data-package"
// [example] profile: "http://example.com/my-profiles-json-schema.json"
// An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
name: "my-nice-name",
// A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
path: ["file.csv", "file2.csv"],
// [example] path: ["http://example.com/file.csv","http://example.com/file2.csv"]
// [example] path: "http://example.com/file.csv"
// Inline data for this resource.
data: null,
// A schema for this resource.
schema: null,
// A human-readable title.
title: "My Package Title",
// A text description. Markdown is encouraged.
description: "# My Package description\nAll about my package.",
// The home on the web that is related to this data package.
homepage: "http://example.com/",
// The raw sources for this resource.
sources: [
{
title: "World Bank and OECD",
path: "http://data.worldbank.org/indicator/NY.GDP.MKTP.CD",
},
],
// The license(s) under which the resource is published.
licenses: [
{
name: "odc-pddl-1.0",
path: "http://opendatacommons.org/licenses/pddl/",
title: "Open Data Commons Public Domain Dedication and License v1.0",
},
],
// The file format of this resource.
format: "xls",
// The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
mediatype: "text/csv",
// The file encoding of this resource.
encoding: "utf-8",
// [example] encoding: "utf-8"
// The size of this resource in bytes.
bytes: 2082,
// The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
hash: "d25c9c77f588f5dc32059d2da1136c02",
// [example] hash: "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"
};
R
$ jsv data-resource.json --output r
Output
# The profile of this descriptor.
profile <- "data-resource"
# [example] profile <- "tabular-data-package"
# [example] profile <- "http://example.com/my-profiles-json-schema.json"
# An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
name <- "my-nice-name"
# A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
path <- ["file.csv","file2.csv"]
# [example] path <- ["http://example.com/file.csv","http://example.com/file2.csv"]
# [example] path <- "http://example.com/file.csv"
# Inline data for this resource.
data <- NA
# A schema for this resource.
schema <- NA
# A human-readable title.
title <- "My Package Title"
# A text description. Markdown is encouraged.
description <- "# My Package description\nAll about my package."
# The home on the web that is related to this data package.
homepage <- "http://example.com/"
# The raw sources for this resource.
sources <- [{"title":"World Bank and OECD","path":"http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"}]
# The license(s) under which the resource is published.
licenses <- [{"name":"odc-pddl-1.0","path":"http://opendatacommons.org/licenses/pddl/","title":"Open Data Commons Public Domain Dedication and License v1.0"}]
# The file format of this resource.
format <- "xls"
# The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
mediatype <- "text/csv"
# The file encoding of this resource.
encoding <- "utf-8"
# [example] encoding <- "utf-8"
# The size of this resource in bytes.
bytes <- 2082L
# The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
hash <- "d25c9c77f588f5dc32059d2da1136c02"
# [example] hash <- "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"
Design Principles
The client should use Frictionless formats by default for describing dataset and resource objects passed to client methods.
In addition, where more than metadata is needed (e.g., we need to access the data stream, or get the schema) we expect the Dataset and Resource objects to follow the Frictionless Data Lib pattern.