Now you can request additional data and/or customized columns!

Try It Now!
Files Size Format Created Updated License Source
3 117kB arff csv zip 1 year ago 1 year ago Open Data Commons Public Domain Dedication and License
The resources for this dataset can be found at https://www.openml.org/d/23 Author: Tjen-Sien Lim Source: As obtained from UCI Please cite: UCI citation Title: Contraceptive Method Choice Sources: (a) Origin: This dataset is a subset of the 1987 National Indonesia read more
Download Developers

Data Files

Download files in this dataset

File Description Size Last changed Download
cmc_arff 33kB arff (33kB)
cmc 32kB csv (32kB) , json (400kB)
cmc_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 36kB zip (36kB)

cmc_arff  

Signup to Premium Service for additional or customised data - Get Started

This is a preview version. There might be more data in the original version.

cmc  

Signup to Premium Service for additional or customised data - Get Started

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
Wifes_age 1 number (default)
Wifes_education 2 number (default)
Husbands_education 3 number (default)
Number_of_children_ever_born 4 number (default)
Wifes_religion 5 number (default)
Wifes_now_working%3F 6 number (default)
Husbands_occupation 7 number (default)
Standard-of-living_index 8 number (default)
Media_exposure 9 number (default)
Contraceptive_method_used 10 number (default)

Integrate this dataset into your favourite tool

Use our data-cli tool designed for data wranglers:

data get https://datahub.io/machine-learning/cmc
data info machine-learning/cmc
tree machine-learning/cmc
# Get a list of dataset's resources
curl -L -s https://datahub.io/machine-learning/cmc/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/machine-learning/cmc/r/0.arff

curl -L https://datahub.io/machine-learning/cmc/r/1.csv

curl -L https://datahub.io/machine-learning/cmc/r/2.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")

json_file <- 'https://datahub.io/machine-learning/cmc/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    print(data)
  }
}

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/machine-learning/cmc/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/machine-learning/cmc/datapackage.json')

# print list of all resources:
print(package.resource_names)

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        print(resource.read())

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/machine-learning/cmc/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
    console.log(dataset.resources[id]._descriptor.name)
  }
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data
      stream.pipe(process.stdout)
    }
  }
})()

Read me

The resources for this dataset can be found at https://www.openml.org/d/23

Author: Tjen-Sien Lim Source: As obtained from UCI Please cite: UCI citation

  1. Title: Contraceptive Method Choice

  2. Sources: (a) Origin: This dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey (b) Creator: Tjen-Sien Lim ([email protected]) © Donor: Tjen-Sien Lim ([email protected]) © Date: June 7, 1997

  3. Past Usage: Lim, T.-S., Loh, W.-Y. & Shih, Y.-S. (1999). A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms. Machine Learning. Forthcoming. (ftp://ftp.stat.wisc.edu/pub/loh/treeprogs/quest1.7/mach1317.pdf or (http://www.stat.wisc.edu/~limt/mach1317.pdf)

  4. Relevant Information: This dataset is a subset of the 1987 National Indonesia Contraceptive Prevalence Survey. The samples are married women who were either not pregnant or do not know if they were at the time of interview. The problem is to predict the current contraceptive method choice (no use, long-term methods, or short-term methods) of a woman based on her demographic and socio-economic characteristics.

  5. Number of Instances: 1473

  6. Number of Attributes: 10 (including the class attribute)

  7. Attribute Information:

    1. Wife’s age (numerical)
    2. Wife’s education (categorical) 1=low, 2, 3, 4=high
    3. Husband’s education (categorical) 1=low, 2, 3, 4=high
    4. Number of children ever born (numerical)
    5. Wife’s religion (binary) 0=Non-Islam, 1=Islam
    6. Wife’s now working? (binary) 0=Yes, 1=No
    7. Husband’s occupation (categorical) 1, 2, 3, 4
    8. Standard-of-living index (categorical) 1=low, 2, 3, 4=high
    9. Media exposure (binary) 0=Good, 1=Not good
    10. Contraceptive method used (class attribute) 1=No-use 2=Long-term 3=Short-term
  8. Missing Attribute Values: None

Information about the dataset CLASSTYPE: nominal CLASSINDEX: last

Datapackage.json

Request Customized Data


Notifications of data updates and schema changes

Warranty / guaranteed updates

Workflow integration (e.g. Python packages, NPM packages)

Customized data (e.g. you need different or additional data)

Or suggest your own feature from the link below