Now you can request additional data and/or customized columns!

Try It Now!

Magictelescope

machine-learning

Files Size Format Created Updated License Source
2 10MB csv zip 1 year ago 1 year ago Open Data Commons Public Domain Dedication and License
The resources for this dataset can be found at https://www.openml.org/d/1120 Author: R. K. Bock. Major Atmospheric Gamma Imaging Cherenkov Telescope project (MAGIC) Donated by P. Savicky, Institute of Computer Science, AS of CR, Czech Republic Source: UCI Please cite: Bock, R.K., Chilingarian, A., read more
Download Developers

Data Files

Download files in this dataset

File Description Size Last changed Download
magictelescope 2MB csv (2MB) , json (4MB)
magictelescope_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 2MB zip (2MB)

magictelescope  

Signup to Premium Service for additional or customised data - Get Started

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
ID 1 number (default)
fLength: 2 number (default)
fWidth: 3 number (default)
fSize: 4 number (default)
fConc: 5 number (default)
fConc1: 6 number (default)
fAsym: 7 number (default)
fM3Long: 8 number (default)
fM3Trans: 9 number (default)
fAlpha: 10 number (default)
fDist: 11 number (default)
class: 12 string (default)

Integrate this dataset into your favourite tool

Use our data-cli tool designed for data wranglers:

data get https://datahub.io/machine-learning/magictelescope
data info machine-learning/magictelescope
tree machine-learning/magictelescope
# Get a list of dataset's resources
curl -L -s https://datahub.io/machine-learning/magictelescope/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/machine-learning/magictelescope/r/0.csv

curl -L https://datahub.io/machine-learning/magictelescope/r/1.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")

json_file <- 'https://datahub.io/machine-learning/magictelescope/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    print(data)
  }
}

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/machine-learning/magictelescope/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/machine-learning/magictelescope/datapackage.json')

# print list of all resources:
print(package.resource_names)

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        print(resource.read())

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/machine-learning/magictelescope/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
    console.log(dataset.resources[id]._descriptor.name)
  }
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data
      stream.pipe(process.stdout)
    }
  }
})()

Read me

The resources for this dataset can be found at https://www.openml.org/d/1120

Author: R. K. Bock. Major Atmospheric Gamma Imaging Cherenkov Telescope project (MAGIC)
Donated by P. Savicky, Institute of Computer Science, AS of CR, Czech Republic
Source: UCI
Please cite: Bock, R.K., Chilingarian, A., Gaug, M., Hakl, F., Hengstebeck, T., Jirina, M., Klaschka, J., Kotrc, E., Savicky, P., Towers, S., Vaicilius, A., Wittek W. (2004). Methods for multidimensional event classification: a case study using images from a Cherenkov gamma-ray telescope. Nucl.Instr.Meth. A, 516, pp. 511-528.

The data are MC generated (see below) to simulate registration of high energy gamma particles in a ground-based atmospheric Cherenkov gamma telescope using the imaging technique. Cherenkov gamma telescope observes high energy gamma rays, taking advantage of the radiation emitted by charged particles produced inside the electromagnetic showers initiated by the gammas, and developing in the atmosphere. This Cherenkov radiation (of visible to UV wavelengths) leaks through the atmosphere and gets recorded in the detector, allowing reconstruction of the shower parameters. The available information consists of pulses left by the incoming Cherenkov photons on the photomultiplier tubes, arranged in a plane, the camera. Depending on the energy of the primary gamma, a total of few hundreds to some 10000 Cherenkov photons get collected, in patterns (called the shower image), allowing to discriminate statistically those caused by primary gammas (signal) from the images of hadronic showers initiated by cosmic rays in the upper atmosphere (background).

Typically, the image of a shower after some pre-processing is an elongated cluster. Its long axis is oriented towards the camera center if the shower axis is parallel to the telescope’s optical axis, i.e. if the telescope axis is directed towards a point source. A principal component analysis is performed in the camera plane, which results in a correlation axis and defines an ellipse. If the depositions were distributed as a bivariate Gaussian, this would be an equidensity ellipse. The characteristic parameters of this ellipse (often called Hillas parameters) are among the image parameters that can be used for discrimination. The energy depositions are typically asymmetric along the major axis, and this asymmetry can also be used in discrimination. There are, in addition, further discriminating characteristics, like the extent of the cluster in the image plane, or the total sum of depositions.

The data set was generated by a Monte Carlo program, Corsika, described in: D. Heck et al., CORSIKA, A Monte Carlo code to simulate extensive air showers, Forschungszentrum Karlsruhe FZKA 6019 (1998). The program was run with parameters allowing to observe events with energies down to below 50 GeV.

Attribute Information:

  1. fLength: continuous # major axis of ellipse [mm]
  2. fWidth: continuous # minor axis of ellipse [mm]
  3. fSize: continuous # 10-log of sum of content of all pixels [in #phot]
  4. fConc: continuous # ratio of sum of two highest pixels over fSize [ratio]
  5. fConc1: continuous # ratio of highest pixel over fSize [ratio]
  6. fAsym: continuous # distance from highest pixel to center, projected onto major axis [mm]
  7. fM3Long: continuous # 3rd root of third moment along major axis [mm]
  8. fM3Trans: continuous # 3rd root of third moment along minor axis [mm]
  9. fAlpha: continuous # angle of major axis with vector to origin [deg]
  10. fDist: continuous # distance from origin to center of ellipse [mm]
  11. class: g,h # gamma (signal), hadron (background)

g = gamma (signal): 12332 h = hadron (background): 6688

For technical reasons, the number of h events is underestimated. In the real data, the h class represents the majority of the events.

The simple classification accuracy is not meaningful for this data, since classifying a background event as signal is worse than classifying a signal event as background. For comparison of different classifiers an ROC curve has to be used. The relevant points on this curve are those, where the probability of accepting a background event as signal is below one of the following thresholds: 0.01, 0.02, 0.05, 0.1, 0.2 depending on the required quality of the sample of the accepted events for different experiments.

Datapackage.json

Request Customized Data


Notifications of data updates and schema changes

Warranty / guaranteed updates

Workflow integration (e.g. Python packages, NPM packages)

Customized data (e.g. you need different or additional data)

Or suggest your own feature from the link below