CKAN Client Guide

CKAN Client Guide

Guide to interacting with CKAN for power users such as data scientists, data engineers and data wranglers.

This guide is about adding and managing data in CKAN programmatically and it assumes:

  • You are familiar with key concepts like metadata, data, etc.
  • You are working programmatically with a programming language such as Python, JavaScript or R (coming soon).

Frictionless Formats

Clients use Frictionless formats by default for describing dataset and resource objects passed to client methods. Internally, we then use the a CKAN <=> Frictionless Mapper (both in JavaScript and in Python) to convert objects to CKAN formats before calling the API. Thus, you can use Frictionless Formats by default with the client.

As CKAN moves to Frictionless to default this will gradually become unnecessary.

Quick start

Most of this guide has Python programming language in mind, including its convention regading using snake case for instances and methods names.

If needed, you can adapt the instructions to JavaScript and R (coming soon) by using camel case instead — for example, if in the Python code we have client.push_blob(…), in JavaScript it would be client.pushBlob(…).

Prerequisites

Install the client for your language of choice:

Create a client

Python

from ckanclient import Client


api_key = '771a05ad-af90-4a70-beea-cbb050059e14'
api_url = 'http://localhost:5000'
organization = 'datopian'
dataset = 'dailyprices'
lfs_url = 'http://localhost:9419'

client = Client(api_url, organization, dataset, lfs_url)

JavaScript

const { Client } = require('ckanClient')

apiKey = '771a05ad-af90-4a70-beea-cbb050059e14'
apiUrl = 'http://localhost:5000'
organization = 'datopian'
dataset = 'dailyprices'

const client = Client(apiKey, organization, dataset, apiUrl)

Upload a resource

That is to say, upload a file, implicitly creating a new dataset.

Python

from frictionless import describe


resource = describe('my-data.csv')
client.push_blob(resource)

Create a new empty Dataset with metadata

Python

client.create('my-data')
client.push(resource)

Adding a resource to an existing Dataset

Not implemented yet.

client.create('my-data')
client.push_resource(resource)

Edit a Dataset's metadata

Not implemented yet.

dataset = client.retrieve('sample-dataset')
client.update_metadata(
    dataset,
    metadata: {'maintainer_email': 'sample@datopian.com'}
)

For details of metadata see the metadata reference below.

API - Porcelain

Client.create

Expects as a single argument: a string, or a dict (in Python), or an object (in JavaScript). This argument is either a valid dataset name or dictionary with metadata for the dataset in Frictionless format.

Client.push

Expects a single argument: a dict (in Python) or an object (in JavaScript) with a dataset metadata in Frictionless format.

Client.retrieve

Expects a single argument: a string with a dataset name or uniquer ID. Returns a Frictionless resource as a dict (in Python) or as an Promisse .<object> (in JavaScript).

Client.push_blob

Expects a single argument: a dict (in Python) or an object (in JavaScript) with a Frictionless resource.

API - Plumbing

Client.action

This method bridges access to the CKAN API action endpoint.

In Python

Arguments:

NameTypeDefaultDescription
namestr(required)The action name, for example, site_read, package_show
payloaddict(required)The payload being sent to CKAN. When a payload is provided to a GET request, it will be converted to URL parameters and each key will be converted to snake case.
http_getboolFalseOptional, if True will make GET request, otherwise POST.
transform_payloadfunctionNoneFunction to mutate the payload before making the request (useful to convert to and from CKAN and Frictionless formats).
transform_responsefunctionNonefunction to mutate the response data before returning it (useful to convert to and from CKAN and Frictionless formats).

The CKAN API uses the CKAN dataset and resource formats (rather than Frictionless formats).

In other words, to stick to Frictionless formats, you can pass frictionless_ckan_mapper.frictionless_to_ckan as transform_payload, and frictionless_ckan_mapper.ckan_to_frictionless as transform_response.

In JavaScript

Arguments:

NameTypeDefaultDescription
actionNamestring(required)The action name, for example, site_read, package_show
payloadobject(required)The payload being sent to CKAN. When a payload is provided to a GET request, it will be converted to URL parameters and each key will be converted to snake case.
useHttpGetobjectfalseOptional, if True will make GET request, otherwise POST.

The JavaScript implementation uses the CKAN dataset and resource formats (rather than Frictionless formats).

In other words, to stick to Frictionless formats, you need to convert from Frictionless to CKAN before calling action , and from CKAN to Frictionless after calling action.

Metadata reference

Your site may have custom metadata that differs from the example set below.

Profile

(string) Defaults to data-resource.

The profile of this descriptor.

Every Package and Resource descriptor has a profile. The default profile, if none is declared, is data-package for Package and data-resource for Resource.

Examples

  • {"profile":"tabular-data-package"}

  • {"profile":"http://example.com/my-profiles-json-schema.json"}

Name

(string)

An identifier string. Lower case characters with ., _, - and / are allowed.

This is ideally a url-usable and human-readable name. Name SHOULD be invariant, meaning it SHOULD NOT change when its parent descriptor is updated.

Example

  • {"name":"my-nice-name"}

Path

A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.

The dereferenced value of each referenced data source in path MUST be commensurate with a native, dereferenced representation of the data the resource describes. For example, in a Tabular Data Resource, this means that the dereferenced value of path MUST be an array.

Validation

It must satisfy one of these conditions
Path

(string)

A fully qualified URL, or a POSIX file path..

Implementations need to negotiate the type of path provided, and dereference the data accordingly.

Examples

  • {"path":"file.csv"}

  • {"path":"http://example.com/file.csv"}

(array)

Examples

  • ["file.csv"]

  • ["http://example.com/file.csv"]

Examples

  • {"path":["file.csv","file2.csv"]}

  • {"path":["http://example.com/file.csv","http://example.com/file2.csv"]}

  • {"path":"http://example.com/file.csv"}

Data

Inline data for this resource.

Schema

(object)

A schema for this resource.

Title

(string)

A human-readable title.

Example

  • {"title":"My Package Title"}

Description

(string)

A text description. Markdown is encouraged.

Example

  • {"description":"# My Package description\nAll about my package."}

Home Page

(string)

The home on the web that is related to this data package.

Example

  • {"homepage":"http://example.com/"}

Sources

(array)

The raw sources for this resource.

Example

  • {"sources":[{"title":"World Bank and OECD","path":"http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"}]}

Licenses

(array)

The license(s) under which the resource is published.

This property is not legally binding and does not guarantee that the package is licensed under the terms defined herein.

Example

  • {"licenses":[{"name":"odc-pddl-1.0","path":"http://opendatacommons.org/licenses/pddl/","title":"Open Data Commons Public Domain Dedication and License v1.0"}]}

Format

(string)

The file format of this resource.

csv, xls, json are examples of common formats.

Example

  • {"format":"xls"}

Media Type

(string)

The media type of this resource. Can be any valid media type listed with IANA.

Example

  • {"mediatype":"text/csv"}

Encoding

(string) Defaults to utf-8.

The file encoding of this resource.

Example

  • {"encoding":"utf-8"}

Bytes

(integer)

The size of this resource in bytes.

Example

  • {"bytes":2082}

Hash

(string)

The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.

Examples

  • {"hash":"d25c9c77f588f5dc32059d2da1136c02"}

  • {"hash":"SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"}

Generating templates

You can use jsv to generate a template script in Python, JavaScript, and R.

To install it:

$ npm install -g git+https://github.com/datopian/jsv.git

Python

$ jsv data-resource.json --output py

Output

dataset_metadata = {
    "profile": "data-resource",  # The profile of this descriptor.
    # [example] "profile": "tabular-data-package"
    # [example] "profile": "http://example.com/my-profiles-json-schema.json"
    "name": "my-nice-name",  # An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
    "path": ["file.csv","file2.csv"],  # A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
    # [example] "path": ["http://example.com/file.csv","http://example.com/file2.csv"]
    # [example] "path": "http://example.com/file.csv"
    "data": None,  # Inline data for this resource.
    "schema": None,  # A schema for this resource.
    "title": "My Package Title",  # A human-readable title.
    "description": "# My Package description\nAll about my package.",  # A text description. Markdown is encouraged.
    "homepage": "http://example.com/",  # The home on the web that is related to this data package.
    "sources": [{"title":"World Bank and OECD","path":"http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"}],  # The raw sources for this resource.
    "licenses": [{"name":"odc-pddl-1.0","path":"http://opendatacommons.org/licenses/pddl/","title":"Open Data Commons Public Domain Dedication and License v1.0"}],  # The license(s) under which the resource is published.
    "format": "xls",  # The file format of this resource.
    "mediatype": "text/csv",  # The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
    "encoding": "utf-8",  # The file encoding of this resource.
    # [example] "encoding": "utf-8"
    "bytes": 2082,  # The size of this resource in bytes.
    "hash": "d25c9c77f588f5dc32059d2da1136c02",  # The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
    # [example] "hash": "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"
}

JavaScript

$ jsv data-resource.json --output js

Output

const datasetMetadata = {
  // The profile of this descriptor.
  profile: "data-resource",
  // [example] profile: "tabular-data-package"
  // [example] profile: "http://example.com/my-profiles-json-schema.json"
  // An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
  name: "my-nice-name",
  // A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
  path: ["file.csv", "file2.csv"],
  // [example] path: ["http://example.com/file.csv","http://example.com/file2.csv"]
  // [example] path: "http://example.com/file.csv"
  // Inline data for this resource.
  data: null,
  // A schema for this resource.
  schema: null,
  // A human-readable title.
  title: "My Package Title",
  // A text description. Markdown is encouraged.
  description: "# My Package description\nAll about my package.",
  // The home on the web that is related to this data package.
  homepage: "http://example.com/",
  // The raw sources for this resource.
  sources: [
    {
      title: "World Bank and OECD",
      path: "http://data.worldbank.org/indicator/NY.GDP.MKTP.CD",
    },
  ],
  // The license(s) under which the resource is published.
  licenses: [
    {
      name: "odc-pddl-1.0",
      path: "http://opendatacommons.org/licenses/pddl/",
      title: "Open Data Commons Public Domain Dedication and License v1.0",
    },
  ],
  // The file format of this resource.
  format: "xls",
  // The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
  mediatype: "text/csv",
  // The file encoding of this resource.
  encoding: "utf-8",
  // [example] encoding: "utf-8"
  // The size of this resource in bytes.
  bytes: 2082,
  // The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
  hash: "d25c9c77f588f5dc32059d2da1136c02",
  // [example] hash: "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"
};

R

$ jsv data-resource.json --output r

Output

# The profile of this descriptor.
profile <- "data-resource"
# [example] profile <- "tabular-data-package"
# [example] profile <- "http://example.com/my-profiles-json-schema.json"
# An identifier string. Lower case characters with `.`, `_`, `-` and `/` are allowed.
name <- "my-nice-name"
# A reference to the data for this resource, as either a path as a string, or an array of paths as strings. of valid URIs.
path <- ["file.csv","file2.csv"]
# [example] path <- ["http://example.com/file.csv","http://example.com/file2.csv"]
# [example] path <- "http://example.com/file.csv"
# Inline data for this resource.
data <- NA
# A schema for this resource.
schema <- NA
# A human-readable title.
title <- "My Package Title"
# A text description. Markdown is encouraged.
description <- "# My Package description\nAll about my package."
# The home on the web that is related to this data package.
homepage <- "http://example.com/"
# The raw sources for this resource.
sources <- [{"title":"World Bank and OECD","path":"http://data.worldbank.org/indicator/NY.GDP.MKTP.CD"}]
# The license(s) under which the resource is published.
licenses <- [{"name":"odc-pddl-1.0","path":"http://opendatacommons.org/licenses/pddl/","title":"Open Data Commons Public Domain Dedication and License v1.0"}]
# The file format of this resource.
format <- "xls"
# The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).
mediatype <- "text/csv"
# The file encoding of this resource.
encoding <- "utf-8"
# [example] encoding <- "utf-8"
# The size of this resource in bytes.
bytes <- 2082L
# The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.
hash <- "d25c9c77f588f5dc32059d2da1136c02"
# [example] hash <- "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"

Design Principles

The client should use Frictionless formats by default for describing dataset and resource objects passed to client methods.

In addition, where more than metadata is needed (e.g., we need to access the data stream, or get the schema) we expect the Dataset and Resource objects to follow the Frictionless Data Lib pattern.

Built with DataHub LogoDataHub Cloud