Seismic bumps

machine-learning

Files Size Format Created Updated License Source
2 2MB csv zip 4 months ago UCI - Machine Learning Repository
This is dataset about seismic bumps occurrences. This dataset contains csv file in which is only header and data rows with no additional information about the dataset. Data Dataset is gathered from seismic-bumps Data Set The data describe the problem of high energy (higher than 10^4 J) seismic read more
Download

Data Files

File Description Size Last changed Download
seismic-bumps 127kB csv (127kB) , json (726kB)
seismic-bumps_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 138kB zip (138kB)

seismic-bumps  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
seismic 1 string (default) result of shift seismic hazard assessment in the mine working obtained by the seismic method (a - lack of hazard, b - low hazard, c - high hazard, d - danger state)
seismoacoustic 2 string (default) result of shift seismic hazard assessment in the mine working obtained by the seismoacoustic method
shift 3 string (default) information about type of a shift (W - coal-getting, N -preparation shift)
genergy 4 integer (default) seismic energy recorded within previous shift by the most active geophone (GMax) out of geophones monitoring the longwall
gpuls 5 integer (default) a number of pulses recorded within previous shift by GMax
gdenergy 6 integer (default) a deviation of energy recorded within previous shift by GMax from average energy recorded during eight previous shifts
gdpuls 7 integer (default) a deviation of a number of pulses recorded within previous shift by GMax from average number of pulses recorded during eight previous shifts
ghazard 8 string (default) result of shift seismic hazard assessment in the mine working obtained by the seismoacoustic method based on registration coming form GMax only
nbumps 9 integer (default) the number of seismic bumps recorded within previous shift
nbumps2 10 integer (default) the number of seismic bumps (in energy range [10^2,10^3)) registered within previous shift
nbumps3 11 integer (default) the number of seismic bumps (in energy range [10^3,10^4)) registered within previous shift
nbumps4 12 integer (default) the number of seismic bumps (in energy range [10^4,10^5)) registered within previous shift
nbumps5 13 integer (default) the number of seismic bumps (in energy range [10^5,10^6)) registered within the last shift
nbumps6 14 integer (default) the number of seismic bumps (in energy range [10^6,10^7)) registered within previous shift
nbumps7 15 integer (default) the number of seismic bumps (in energy range [10^6,10^7)) registered within previous shift
nbumps89 16 integer (default) the number of seismic bumps (in energy range [10^6,10^7)) registered within previous shift
energy 17 integer (default) the number of seismic bumps (in energy range [10^6,10^7)) registered within previous shift
maxenergy 18 integer (default) the maximum energy of the seismic bumps registered within previous shift
class 19 integer (default) the decision attribute - "1" means that high energy seismic bump occurred in the next shift ("hazardous state"), "0" means that no high energy seismic bumps occurred in the next shift ("non-hazardous state")

Read me

This is dataset about seismic bumps occurrences. This dataset contains csv file in which is only header and data rows with no additional information about the dataset.

Data

Dataset is gathered from seismic-bumps Data Set

The data describe the problem of high energy (higher than 10^4 J) seismic bumps forecasting in a coal mine. Data come from two of longwalls located in a Polish coal mine.

Mining activity was and is always connected with the occurrence of dangers which are commonly called mining hazards. A special case of such threat is a seismic hazard which frequently occurs in many underground mines. Seismic hazard is the hardest detectable and predictable of natural hazards and in this respect it is comparable to an earthquake. More and more advanced seismic and seismoacoustic monitoring systems allow a better understanding rock mass processes and definition of seismic hazard prediction methods. Accuracy of so far created methods is however far from perfect. Therefore, it is essential to search for new opportunities of better hazard prediction, also using machine learning methods. Unbalanced distribution of positive (“hazardous state”) and negative (“non-hazardous state”) examples is a serious problem in seismic hazard prediction. Currently used methods are still insufficient to achieve good sensitivity and specificity of predictions. The task of seismic prediction can be defined in different ways, but the main aim of all seismic hazard assessment methods is to predict (with given precision relating to time and date) of increased seismic activity which can cause a rockburst. In the data set each row contains a summary statement about seismic activity in the rock mass within one shift (8 hours). If decision attribute has the value 1, then in the next shift any seismic bump with an energy higher than 10^4 J was registered. That task of hazards prediction bases on the relationship between the energy of recorded tremors and seismoacoustic activity with the possibility of rockburst occurrence. Hence, such hazard prognosis is not connected with accurate rockburst prediction. Moreover, with the information about the possibility of hazardous situation occurrence, an appropriate supervision service can reduce a risk of rockburst (e.g. by distressing shooting) or withdraw workers from the threatened area. Good prediction of increased seismic activity is therefore a matter of great practical importance. The presented data set is characterized by unbalanced distribution of positive and negative examples. In the data set there are only 170 positive examples representing class 1.

Instances: 2584
Attributes: 18 + class
Missing Attribute Values: None Class distribution:

  • hazardous state" (class 1) : 170 (6.6%)
  • non-hazardous state" (class 0): 2414 (93.4%)

Field descriptions:

  1. seismic: result of shift seismic hazard assessment in the mine working obtained by the seismic method (a - lack of hazard, b - low hazard, c - high hazard, d - danger state);
  2. seismoacoustic: result of shift seismic hazard assessment in the mine working obtained by the seismoacoustic method;
  3. shift: information about type of a shift (W - coal-getting, N -preparation shift);
  4. genergy: seismic energy recorded within previous shift by the most active geophone (GMax) out of geophones monitoring the longwall;
  5. gpuls: a number of pulses recorded within previous shift by GMax;
  6. gdenergy: a deviation of energy recorded within previous shift by GMax from average energy recorded during eight previous shifts;
  7. gdpuls: a deviation of a number of pulses recorded within previous shift by GMax from average number of pulses recorded during eight previous shifts;
  8. ghazard: result of shift seismic hazard assessment in the mine working obtained by the seismoacoustic method based on registration coming form GMax only;
  9. nbumps: the number of seismic bumps recorded within previous shift;
  10. nbumps2: the number of seismic bumps (in energy range [10^2,10^3)) registered within previous shift;
  11. nbumps3: the number of seismic bumps (in energy range [10^3,10^4)) registered within previous shift;
  12. nbumps4: the number of seismic bumps (in energy range [10^4,10^5)) registered within previous shift;
  13. nbumps5: the number of seismic bumps (in energy range [10^5,10^6)) registered within the last shift;
  14. nbumps6: the number of seismic bumps (in energy range [10^6,10^7)) registered within previous shift;
  15. nbumps7: the number of seismic bumps (in energy range [10^7,10^8)) registered within previous shift;
  16. nbumps89: the number of seismic bumps (in energy range [10^8,10^10)) registered within previous shift;
  17. energy: total energy of seismic bumps registered within previous shift;
  18. maxenergy: the maximum energy of the seismic bumps registered within previous shift;
  19. class: the decision attribute - “1” means that high energy seismic bump occurred in the next shift (“hazardous state”), “0” means that no high energy seismic bumps occurred in the next shift (“non-hazardous state”)

Data is located directory data

data/seismic-bumps.csv

Attributes are same as are were in input data

Preparation

To get our output data several things are done to input data:

  • header with description about the data is removed
  • repetition of rows is removed

Run python script:

scripts/main.py

License

Licensed under the Public Domain Dedication and License (assuming either no rights or public domain license in source data).

Citation

Sikora M., Wrobel L.: Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines. Archives of Mining Sciences, 55(1), 2010, 91-114.

Donors and creators

Marek Sikora^{1,2} ([email protected]), Lukasz Wrobel^{1} ([email protected]) (1) Institute of Computer Science, Silesian University of Technology, 44-100 Gliwice, Poland (2) Institute of Innovative Technologies EMAG, 40-189 Katowice, Poland

Import into your tool

Data-cli or just data is the program to get and post your data with the datahub.
Download CLI tool and use it with the datahub almost like you use git with the github:

data get https://datahub.io/machine-learning/seismic-bumps
data info machine-learning/seismic-bumps
tree machine-learning/seismic-bumps
# Get a list of dataset's resources
curl -L -s https://datahub.io/machine-learning/seismic-bumps/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/machine-learning/seismic-bumps/r/0.csv

curl -L https://datahub.io/machine-learning/seismic-bumps/r/1.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")

json_file <- 'https://datahub.io/machine-learning/seismic-bumps/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    print(data)
  }
}

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/machine-learning/seismic-bumps/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/machine-learning/seismic-bumps/datapackage.json')

# print list of all resources:
print(package.resource_names)

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        print(resource.read())

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/machine-learning/seismic-bumps/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
    console.log(dataset.resources[id]._descriptor.name)
  }
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data
      stream.pipe(process.stdout)
    }
  }
})()
Datapackage.json