Now you can request additional data and/or customized columns!

Try It Now!

Satimage

machine-learning

Files Size Format Created Updated License Source
3 5MB arff csv zip 5 years ago 5 years ago Open Data Commons Public Domain Dedication and License
The resources for this dataset can be found at https://www.openml.org/d/182 Author: Ashwin Srinivasan, Department of Statistics and Data Modeling, University of Strathclyde Source: UCI) - 1993 Please cite: UCI The database consists of the multi-spectral values of pixels in 3x3 neighbourhoods in a read more
Download Developers

Data Files

Download files in this dataset

File Description Size Last changed Download
satimage_arff 2MB arff (2MB)
satimage 2MB csv (2MB) , json (5MB)
satimage_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 2MB zip (2MB)

satimage_arff  

Signup to Premium Service for additional or customised data - Get Started

This is a preview version. There might be more data in the original version.

satimage  

Signup to Premium Service for additional or customised data - Get Started

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
Aattr 1 number (default)
Battr 2 number (default)
Cattr 3 number (default)
Dattr 4 number (default)
Eattr 5 number (default)
Fattr 6 number (default)
A1attr 7 number (default)
B2attr 8 number (default)
C3attr 9 number (default)
D4attr 10 number (default)
E5attr 11 number (default)
F6attr 12 number (default)
A7attr 13 number (default)
B8attr 14 number (default)
C9attr 15 number (default)
D10attr 16 number (default)
E11attr 17 number (default)
F12attr 18 number (default)
A13attr 19 number (default)
B14attr 20 number (default)
C15attr 21 number (default)
D16attr 22 number (default)
E17attr 23 number (default)
F18attr 24 number (default)
A19attr 25 number (default)
B20attr 26 number (default)
C21attr 27 number (default)
D22attr 28 number (default)
E23attr 29 number (default)
F24attr 30 number (default)
A25attr 31 number (default)
B26attr 32 number (default)
C27attr 33 number (default)
D28attr 34 number (default)
E29attr 35 number (default)
F30attr 36 number (default)
class 37 number (default)

Integrate this dataset into your favourite tool

Use our data-cli tool designed for data wranglers:

data get https://datahub.io/machine-learning/satimage
data info machine-learning/satimage
tree machine-learning/satimage
# Get a list of dataset's resources
curl -L -s https://datahub.io/machine-learning/satimage/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/machine-learning/satimage/r/0.arff

curl -L https://datahub.io/machine-learning/satimage/r/1.csv

curl -L https://datahub.io/machine-learning/satimage/r/2.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")

json_file <- 'https://datahub.io/machine-learning/satimage/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    print(data)
  }
}

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/machine-learning/satimage/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/machine-learning/satimage/datapackage.json')

# print list of all resources:
print(package.resource_names)

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        print(resource.read())

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/machine-learning/satimage/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
    console.log(dataset.resources[id]._descriptor.name)
  }
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data
      stream.pipe(process.stdout)
    }
  }
})()

Read me

The resources for this dataset can be found at https://www.openml.org/d/182

Author: Ashwin Srinivasan, Department of Statistics and Data Modeling, University of Strathclyde
Source: UCI - 1993
Please cite: UCI

The database consists of the multi-spectral values of pixels in 3x3 neighbourhoods in a satellite image, and the classification associated with the central pixel in each neighbourhood. The aim is to predict this classification, given the multi-spectral values. In the sample database, the class of a pixel is coded as a number.

One frame of Landsat MSS imagery consists of four digital images of the same scene in different spectral bands. Two of these are in the visible region (corresponding approximately to green and red regions of the visible spectrum) and two are in the (near) infra-red. Each pixel is a 8-bit binary word, with 0 corresponding to black and 255 to white. The spatial resolution of a pixel is about 80m x 80m. Each image contains 2340 x 3380 such pixels.

The database is a (tiny) sub-area of a scene, consisting of 82 x 100 pixels. Each line of data corresponds to a 3x3 square neighbourhood of pixels completely contained within the 82x100 sub-area. Each line contains the pixel values in the four spectral bands (converted to ASCII) of each of the 9 pixels in the 3x3 neighbourhood and a number indicating the classification label of the central pixel.

Each pixel is categorized as one of the following classes:
1 red soil
2 cotton crop
3 grey soil
4 damp grey soil
5 soil with vegetation stubble
6 mixture class (all types present)
7 very damp grey soil

NB. There are no examples with class 6 in this dataset.

The data is given in random order and certain lines of data have been removed so you cannot reconstruct the original image from this dataset.

Attribute information

There are 36 predictive attributes (= 4 spectral bands x 9 pixels in neighborhood). In each line of data the four spectral values for the top-left pixel are given first followed by the four spectral values for the top-middle pixel and then those for the top-right pixel, and so on with the pixels read out in sequence left-to-right and top-to-bottom. Thus, the four spectral values for the central pixel are given by attributes 17,18,19 and 20. If you like you can use only these four attributes, while ignoring the others. This avoids the problem which arises when a 3x3 neighbourhood straddles a boundary.

In this version, the pixel values 0…255 are normalized around 0.

Note: it is unclear why the attributes are named Aattr - Fattr in this version, since there are only 4 bands and 9 pixels, naming them A1, B1, C1, D1, A2, B2, C2, D2, … would have made more sense.

Datapackage.json

Request Customized Data


Notifications of data updates and schema changes

Warranty / guaranteed updates

Workflow integration (e.g. Python packages, NPM packages)

Customized data (e.g. you need different or additional data)

Or suggest your own feature from the link below