Now you can request additional data and/or customized columns!

Try It Now!
Files Size Format Created Updated License Source
3 317kB arff csv zip 3 years ago 3 years ago Open Data Commons Public Domain Dedication and License
The resources for this dataset can be found at https://www.openml.org/d/183 Author: Source: Unknown - Please cite: Title of Database: Abalone data Sources: (a) Original owners of database: Marine Resources Division Marine Research Laboratories - Taroona Department of Primary Industry read more
Download Developers

Data Files

Download files in this dataset

File Description Size Last changed Download
abalone_arff 188kB arff (188kB)
abalone 192kB csv (192kB) , json (783kB)
abalone_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 250kB zip (250kB)

abalone_arff  

Signup to Premium Service for additional or customised data - Get Started

This is a preview version. There might be more data in the original version.

abalone  

Signup to Premium Service for additional or customised data - Get Started

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
Sex 1 string (default)
Length 2 number (default)
Diameter 3 number (default)
Height 4 number (default)
Whole_weight 5 number (default)
Shucked_weight 6 number (default)
Viscera_weight 7 number (default)
Shell_weight 8 number (default)
Class_number_of_rings 9 number (default)

Integrate this dataset into your favourite tool

Use our data-cli tool designed for data wranglers:

data get https://datahub.io/machine-learning/abalone
data info machine-learning/abalone
tree machine-learning/abalone
# Get a list of dataset's resources
curl -L -s https://datahub.io/machine-learning/abalone/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/machine-learning/abalone/r/0.arff

curl -L https://datahub.io/machine-learning/abalone/r/1.csv

curl -L https://datahub.io/machine-learning/abalone/r/2.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")

json_file <- 'https://datahub.io/machine-learning/abalone/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    print(data)
  }
}

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/machine-learning/abalone/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/machine-learning/abalone/datapackage.json')

# print list of all resources:
print(package.resource_names)

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        print(resource.read())

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/machine-learning/abalone/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
    console.log(dataset.resources[id]._descriptor.name)
  }
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data
      stream.pipe(process.stdout)
    }
  }
})()

Read me

The resources for this dataset can be found at https://www.openml.org/d/183

Author:
Source: Unknown -
Please cite:

  1. Title of Database: Abalone data

  2. Sources:

    (a) Original owners of database: Marine Resources Division Marine Research Laboratories - Taroona Department of Primary Industry and Fisheries, Tasmania GPO Box 619F, Hobart, Tasmania 7001, Australia (contact: Warwick Nash +61 02 277277, [email protected])

    (b) Donor of database: Sam Waugh ([email protected]) Department of Computer Science, University of Tasmania GPO Box 252C, Hobart, Tasmania 7001, Australia

    © Date received: December 1995

  3. Past Usage:

    Sam Waugh (1995) “Extending and benchmarking Cascade-Correlation”, PhD thesis, Computer Science Department, University of Tasmania.

    – Test set performance (final 1044 examples, first 3133 used for training): 24.86% Cascade-Correlation (no hidden nodes) 26.25% Cascade-Correlation (5 hidden nodes) 21.5% C4.5 0.0% Linear Discriminate Analysis 3.57% k=5 Nearest Neighbour (Problem encoded as a classification task)

    – Data set samples are highly overlapped. Further information is required to separate completely using affine combinations. Other restrictions to data set examined.

    David Clark, Zoltan Schreter, Anthony Adams “A Quantitative Comparison of Dystal and Backpropagation”, submitted to the Australian Conference on Neural Networks (ACNN’96). Data set treated as a 3-category classification problem (grouping ring classes 1-8, 9 and 10, and 11 on).

    – Test set performance (3133 training, 1044 testing as above): 64% Backprop 55% Dystal – Previous work (Waugh, 1995) on same data set: 61.40% Cascade-Correlation (no hidden nodes) 65.61% Cascade-Correlation (5 hidden nodes) 59.2% C4.5 32.57% Linear Discriminate Analysis 62.46% k=5 Nearest Neighbour

  4. Relevant Information Paragraph:

    Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope – a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.

    From the original data examples with missing values were removed (the majority having the predicted value missing), and the ranges of the continuous values have been scaled for use with an ANN (by dividing by 200).

    Data comes from an original (non-machine-learning) study:

    Warwick J Nash, Tracy L Sellers, Simon R Talbot, Andrew J Cawthorn and Wes B Ford (1994) “The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait”, Sea Fisheries Division, Technical Report No. 48 (ISSN 1034-3288)

  5. Number of Instances: 4177

  6. Number of Attributes: 8

  7. Attribute information:

    Given is the attribute name, attribute type, the measurement unit and a brief description. The number of rings is the value to predict: either as a continuous value or as a classification problem.

    Name Data Type Meas. Description


    Sex nominal M, F, and I (infant) Length continuous mm Longest shell measurement Diameter continuous mm perpendicular to length Height continuous mm with meat in shell Whole weight continuous grams whole abalone Shucked weight continuous grams weight of meat Viscera weight continuous grams gut weight (after bleeding) Shell weight continuous grams after being dried Rings integer +1.5 gives the age in years

    Statistics for numeric domains:

    Length	Diam	Height	Whole	Shucked	Viscera	Shell	Rings
    

    Min 0.075 0.055 0.000 0.002 0.001 0.001 0.002 1 Max 0.815 0.650 1.130 2.826 1.488 0.760 1.005 29 Mean 0.524 0.408 0.140 0.829 0.359 0.181 0.239 9.934 SD 0.120 0.099 0.042 0.490 0.222 0.110 0.139 3.224 Correl 0.557 0.575 0.557 0.540 0.421 0.504 0.628 1.0

  8. Missing Attribute Values: None

  9. Class Distribution:

    Class Examples


    1 1 2 1 3 15 4 57 5 115 6 259 7 391 8 568 9 689 10 634 11 487 12 267 13 203 14 126 15 103 16 67 17 58 18 42 19 32 20 26 21 14 22 6 23 9 24 2 25 1 26 1 27 2 29 1


    Total 4177

Num Instances: 4177 Num Attributes: 9 Num Continuous: 8 (Int 1 / Real 7) Num Discrete: 1 Missing values: 0 / 0.0%

 name                      type enum ints real     missing    distinct  (1)

1 ‘Sex’ Enum 100% 0% 0% 0 / 0% 3 / 0% 0% 2 ‘Length’ Real 0% 0% 100% 0 / 0% 134 / 3% 0% 3 ‘Diameter’ Real 0% 0% 100% 0 / 0% 111 / 3% 0% 4 ‘Height’ Real 0% 0% 100% 0 / 0% 51 / 1% 0% 5 ‘Whole weight’ Real 0% 0% 100% 0 / 0% 2429 / 58% 31% 6 ‘Shucked weight’ Real 0% 0% 100% 0 / 0% 1515 / 36% 10% 7 ‘Viscera weight’ Real 0% 0% 100% 0 / 0% 880 / 21% 3% 8 ‘Shell weight’ Real 0% 0% 100% 0 / 0% 926 / 22% 8% 9 ‘Class_Rings’ Int 0% 100% 0% 0 / 0% 28 / 1% 0%

Datapackage.json

Request Customized Data


Notifications of data updates and schema changes

Warranty / guaranteed updates

Workflow integration (e.g. Python packages, NPM packages)

Customized data (e.g. you need different or additional data)

Or suggest your own feature from the link below