US MSHA Mines and Production

zaneselvans

Files Size Format Created Updated License Source
4 117MB csv zip 9 months ago 9 months ago CC0 1.0 US Mining Safety and Health Administration (MSHA)
This data package contains a subset of the open data published by the [US Mining Health and Safety Administration (MSHA)](https://www.msha.gov). It focuses primarily on data that is related to the production of coal, and thus also to the US electricity system. It was packaged for easy re-use by read more
Download Developers

Data Files

Download files in this dataset

File Description Size Last changed Download
mines The Mine dataset lists all Coal and Metal/Non-Metal mines under MSHA's jurisdiction since 1/1/1970. It includes such information as the current status of each mine (Active, Abandoned, NonProducing, etc.), the current owner and operating company, commodity codes and physical attributes of the mine. 63MB csv (63MB) , json (174MB)
controller-operator-history This dataset shows the history of controllers at mining operations and the associations to the operators at those mines. Included are the starting and ending dates for a controller at each mine and the operator history at that mine. 37MB csv (37MB) , json (75MB)
employment-production-quarterly This dataset contains employment and coal production reported by mine operators for each quarter in a calendar year, by subunit and mine ID beginning on 1/1/2000. The subunit code identifies the location or operation of the mine relating to the: (01) Underground; (02) Surface at underground; (03) Strip, quarry, open pit; (04) Auger; (05) Culm bank/refuse pile; (06) Dredge; (12) Other mining; (17) Independent shops or yards; (30) Mill operation/preparation plant; (99) Office workers at mine site. 321MB csv (321MB) , json (714MB)
pudl-msha_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 117MB zip (117MB)

MSHA Mines [mines]  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
MINE_ID 1 integer (default) Identification number assigned to the mine by MSHA. It is a unique primary key to join to the Inspections, Mine Address, Accidents, Annual Employ/Prod and Qrtly Employ/Prod tables.
CURRENT_MINE_NAME 2 string (default) Name of the mine as designated on the Legal ID Form (LID) or Mine Information Form (MIF).
COAL_METAL_IND 3 string (default) Identifies if the mine is a Coal or Metal/Non-Metal mine.
CURRENT_MINE_TYPE 4 string (default) From the Legal ID (LID) form. The types are Facility, Surface or Underground.
CURRENT_MINE_STATUS 5 string (default) Current status of the mine. Values are Abandoned, Abandoned and Sealed, Active, Intermittent, New Mine, NonProducing and Temporarily Idled.
CURRENT_STATUS_DT 6 date (%Y-%m-%d) Date the mine obtained the current status from the Mine Information Form (MIF).
CURRENT_CONTROLLER_ID 7 string (default) Identification number assigned by MSHA Assessments Center for a Legal Entity acting as a controller of an operator. May contain null values if this record has a mine status of New Mine. If it is a New Mine, this information will be entered into the system at a future date.
CURRENT_CONTROLLER_NAME 8 string (default) Either the business name or a person's name for the Legal Entity. May contain null values if this record has a mine status of New Mine. If it is a New Mine, this information will be entered into the system at a future date.
CURRENT_OPERATOR_ID 9 string (default) Identification number assigned by MSHA Assessments Center for a Legal Entity acting as an operator at a mine. May contain null values if this record has a status of New Mine. If it is a New Mine, this information will be entered into the system at a future date.
CURRENT_OPERATOR_NAME 10 string (default) The latest operator name as updated by a LID (legal entity id form) or MIF (mine information form). If the last action is a LID, it will be updated if Assessments updates the name when it is approved. A new MIF will subsequently overwrite the mines operator name. May contain null values if this record has a status of New Mine. If it is a New Mine, this information will be entered into the system at a future date.
STATE 11 string (default) State in which the mine is located. Standard state abbreviation code.
BOM_STATE_CD 12 integer (default) Bureau of Mines (BOM) assigned state codes.
FIPS_CNTY_CD 13 integer (default) Federal Information Processing Standard county code.
FIPS_CNTY_NM 14 string (default) Federal Information Processing Standards (FIPS) county code name.
CONG_DIST_CD 15 integer (default) The Congressional District of the state in which the mine is located. Congressional District numbers are only unique within states, so State Code and Congressional District Code should be reported together. May contain null values.
COMPANY_TYPE 16 string (default) Unique description for each legal entity type. Values are Corporation, Limited Liability Corporation, Other, Partnership and Sole Proprietor. May contain null values.
CURRENT_CONTROLLER_BEGIN_DT 17 date (%Y-%m-%d) Start date of the operating period at the mine. May contain null values until the controller if a controller id has not yet been submitted.
DISTRICT 18 string (default) The first three characters of the Coal districts and the first two characters of the Metal districts.
OFFICE_CD 19 string (default) MSHA code that identifies the office to which the mine is assigned. This is entered on the Mine Information Form (MIF).
OFFICE_NAME 20 string (default) The name of the office to which the mine is assigned.
ASSESS_CTRL_NO 21 string (default) The most recent Assessment Control Number for a mine determined by selecting the most recent issue date of all citations associated with a mine for all associated violations regardless of the type of violator. The system creates the Assessment Control Number.
PRIMARY_SIC_CD 22 integer (default) This is a code derived from the SIC codes to use as a primary key for the primary commodity extracted at a mine. If it is blank, a Mine Information Form (MIF) is required from the inspector to know the true SIC code. May contain null values.
PRIMARY_SIC 23 string (default) Description of the Standard Industrial Classification Code (SIC) code for the primary commodity at a mine. May contain null values.
PRIMARY_SIC_CD_1 24 string (default) Standard Industrial Classification Code that identifies the primary product of the mill or mine. May contain null values.
PRIMARY_SIC_CD_SFX 25 string (default) Suffix to the Standard Industrial Classification Code (SIC) that defines the primary commodity of the mill or mine. May contain null values.
SECONDARY_SIC_CD 26 integer (default) This is a code derived from the Standard Industrial Classification Code (SIC) codes to use as a primary key for the secondary commodity extracted at a mine. May contain null values.
SECONDARY_SIC 27 string (default) Description of the Standard Industrial Classification Code (SIC) code for the secondary commodity at a mine. May contain null values.
SECONDARY_SIC_CD_1 28 string (default) Standard Industrial Classification Code (SIC) that identifies the secondary product of the mill or mine. May contain null values.
SECONDARY_SIC_CD_SFX 29 string (default) Suffix to the Standard Industrial Classification Code (SIC) that defines the secondary commodity of the mill or mine. May contain null values.
PRIMARY_CANVASS_CD 30 integer (default) Canvass code associated with the primary commodity code. This code is also known as an industry group code. Values are 1, 2, 5, 6, 7, 8.
PRIMARY_CANVASS 31 string (default) Unique code abbreviation for the primary industry group code for a mine. (1) Coal(Anthracite) SIC 123100; (2) Coal(Bituminous); (5) M/NM (Sand and Gravel); (6) M/NM (Stone); (7) NonMetal; (8) Metal. May contain null values.
SECONDARY_CANVASS_CD 32 integer (default) Canvass code associated with the secondary commodity code. This code is also known as an industry group code. Values are 1, 2, 5, 6, 7, 8.
SECONDARY_CANVASS 33 string (default) Unique code abbreviation for the secondary industry group code for a mine. (1) Coal(Anthracite) SIC 123100; (2) Coal(Bituminous); (5) M/NM (Sand and Gravel); (6) M/NM (Stone); (7) NonMetal; (8) Metal. May contain null values. May contain null values.
CURRENT_103I 34 string (default) This is the description of the Mine 103I Classification Code: Hazard, Ignition or Explosion, Inspection Once Every 10-days, Inspect Once Every 15-days, Inspect Once Every 5-days, Never Had 103I Status, Removed From 103I Status. May contain null values.
CURRENT_103I_DT 35 date (%Y-%m-%d) The date the mine entered the current 103I status. May contain null values.
PORTABLE_OPERATION 36 boolean (default) Indicates whether this is a portable mine or not ('Y' or 'N').
PORTABLE_FIPS_ST_CD 37 integer (default) The Federal Information Processing Standards (FIPS) state code if it is a portable mine. May contain null values.
DAYS_PER_WEEK 38 integer (default) Number of days per week that the mine is operational. Entered on the Mine Information Form (MIF).
HOURS_PER_SHIFT 39 integer (default) Number of hours per shift at the mine. Entered on the Mine Information Form (MIF). May contain null values.
PROD_SHIFTS_PER_DAY 40 integer (default) Number of production shifts per 24-hour day. Entered on the Mine Information Form (MIF). May contain null values.
MAINT_SHIFTS_PER_DAY 41 integer (default) Number of maintenance-only shifts per 24-hour day. Entered on the Mine Information Form (MIF). May contain null values.
NO_EMPLOYEES 42 integer (default) Number of workers employed at the mine. Entered on the Mine Information Form (MIF). May contain null values.
PART48_TRAINING 43 boolean (default) Indicates whether MSHA is restricted from enforcing Part 48 training requirements ('Y' or 'N').
LONGITUDE 44 number (default) Longitude denoting the mine location shown in the following format: XXX.XXXXXX (3.6). May contain null values.
LATITUDE 45 number (default) Latitude denoting the mine location shown in the following format: xx.xxxxxx (2.6). May contain null values.
AVG_MINE_HEIGHT 46 number (default) Average mining height in inches. Coal mines only. May contain null values for Coal and Metal/Non-Metal mines.
MINE_GAS_CATEGORY_CD 47 string (default) This categorization is used by underground Metal/Non-Metal mines and the surface mills of Subcategory I-C mines (gilsonite) mines. The purpose is to protect persons against the hazards of methane and dusts containing volatile matter. May contain null values if this does not apply.
METHANE_LIBERATION 48 number (default) Methane Liberation on Section (cubic feet each 24 hrs). May contain null values if this does not apply at the mine.
NO_PRODUCING_PITS 49 integer (default) Number of pits that are actively producing materials at the mine location (Coal only). May contain null values for both Coal and Metal/Non-Metal mines.
NO_NONPRODUCING_PITS 50 integer (default) Number of pits that are not producing materials at the mine location (Coal only). May contain null values if this does not apply at the mine.
NO_TAILING_PONDS 51 integer (default) Number of tailing ponds (Metal/Non-Metal mines only). May contain null values if this does not apply at the mine.
PILLAR_RECOVERY_USED 52 boolean (default) Indicator denoting whether or not a mine uses pillar recovery mining techniques (underground coal mines only) - ('Y' or 'N').
HIGHWALL_MINER_USED 53 boolean (default) Indicator denoting whether or not a mine uses a highwall miner (surface coal only) ('Y' or 'N').
MULTIPLE_PITS 54 boolean (default) Indicates whether there are multiple pits at the mine location ('Y' or 'N)'.
MINERS_REP_IND 55 boolean (default) Indicates where there is a miners' representative at the location ('Y' or 'N').
SAFETY_COMMITTEE_IND 56 boolean (default) Values are 'Y' or 'N'.
MILES_FROM_OFFICE 57 number (default) Driving distance to the mine/mill from the office responsible for conducting inspection. Can contain zeroes.
DIRECTIONS_TO_MINE 58 string (default) Free-form description of directions on how to get to the mine. It is input on the Legal Id Form (LID) and Mine Information Form (MIF). May contain null values.
NEAREST_TOWN 59 string (default) Nearest town or city. Entered on Mine Information Form (MIF). May contain null values.

MSHA Mine Controller / Operator History [controller-operator-history]  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
CONTROLLER_ID 1 string (default) Identification number assigned by MSHA Assessments Center for a Legal Entity acting as a controller of an operator. May contain null values if this record has a mine status of New Mine and ID will be added at a later date.
CONTROLLER_NAME 2 string (default) Either the business name or a person's name for the Legal Entity. May contain null values if this record has a mine status of New Mine and name will be added at a later date.
CONTROLLER_START_DT 3 date (%Y-%m-%d) Date the controller started as the controller of the operator at the mine.
CONTROLLER_END_DT 4 date (%Y-%m-%d) Date the controller ceased to control the operator.
CONTROLLER_TYPE 5 string (default) Designates whether the controller is a Company name or a Person.
COAL_METAL_IND 6 string (default) Identifies if the mine is a Coal or Metal/Non-Metal mine.
MINE_ID 7 integer (default) Identification number assigned to the mine by MSHA. It is a unique primary key to join to the Inspections, Mine Address, Accidents, Annual Employ/Prod and Qrtly Employ/Prod tables.
MINE_NAME 8 string (default) Name of the mine as designated on the Legal ID Form (LID) or Mine Information Form (MIF).
MINE_STATUS 9 string (default) Status of the mine at this point in time. Values are Abandoned, Abandoned and Sealed, Active, Intermittent, New Mine, NonProducing and Temporarily Idled.
OPERATOR_ID 10 string (default) Identification number assigned by MSHA Assessments Center for a Legal Entity acting as an operator at a mine. May contain null values if this record has a status of New Mine. If it is a New Mine, this information will be entered into the system at a future date.
OPERATOR_NAME 11 string (default) The operator name as updated by a LID (Legal ID Form) or MIF (Mine Information Form) at this point in time. If the last action is a LID, it will be updated if Assessments updates the name when it is approved. A new MIF will subsequently overwrite the mine's operator name. May contain null values if this record has a status of New Mine. If it is a New Mine, this information will be entered into the system at a future date.
OPERATOR_START_DT 12 date (%Y-%m-%d) Start date of the operating period at the mine.
OPERATOR_END_DT 13 date (%Y-%m-%d) End date of the operating period at the mine. If no date appears, the operator is still active.

MSHA Mine Employment and Production (Quarterly) [employment-production-quarterly]  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
MINE_ID 1 integer (default) Identification number assigned to the operation by MSHA. Use mine_id to join to the Mines table.
MINE_NAME 2 string (default) Current mine name.
STATE 3 string (default) State in which the mine is located that is reporting employment and production.
SUBUNIT_CD 4 string (default) Code that identifies the location within a mine.
SUBUNIT 5 string (default) Description of the subunit code referring to a location within a mine: (01) Underground operation; (02) Surface operation at underground mine; (03) Strip, quarry or open pit; (04) Auger (Coal only); (05) Culm bank or refuse pile (Coal only); (06) Dredge; (12) Other surface (Metal/Non-Metal only); (17) Independent shop or yard; (30) Mill operation/preparation plant; (99) Office workers at mine site.
CAL_YR 6 year (default) The 4-digit year of the employment/production data.
CAL_QTR 7 integer (default) The single-digit quarter for which the employment and coal production is reported.
FISCAL_YR 8 year (default) The four-digit fiscal year of the employment/production data. MSHA's fiscal year begins October 1 and ends September 30.
FISCAL_QTR 9 integer (default) The single-digit fiscal quarter for which the employment and production data is reported.
AVG_EMPLOYEE_CNT 10 number (default) Average number of employees reported by the operator for the applicable quarter, subunit and year beginning with 2000. Can be zero.
HOURS_WORKED 11 number (default) Total employee hours reported by the operator during the quarter for this subunit, year and quarter. Can be zero.
COAL_PRODUCTION 12 number (default) Quarterly coal production, in tons, reported by the operator for the applicable subunit. May be zero or null values.
COAL_METAL_IND 13 string (default) Identifies if the employment and production are being reported for a Coal or Metal/Non-Metal mine.

Integrate this dataset into your favourite tool

Use our data-cli tool designed for data wranglers:

data get https://datahub.io/zaneselvans/pudl-msha
data info zaneselvans/pudl-msha
tree zaneselvans/pudl-msha
# Get a list of dataset's resources
curl -L -s https://datahub.io/zaneselvans/pudl-msha/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/zaneselvans/pudl-msha/r/0.csv

curl -L https://datahub.io/zaneselvans/pudl-msha/r/1.csv

curl -L https://datahub.io/zaneselvans/pudl-msha/r/2.csv

curl -L https://datahub.io/zaneselvans/pudl-msha/r/3.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")

json_file <- 'https://datahub.io/zaneselvans/pudl-msha/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    print(data)
  }
}

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/zaneselvans/pudl-msha/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/zaneselvans/pudl-msha/datapackage.json')

# print list of all resources:
print(package.resource_names)

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        print(resource.read())

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/zaneselvans/pudl-msha/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
    console.log(dataset.resources[id]._descriptor.name)
  }
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data
      stream.pipe(process.stdout)
    }
  }
})()

Read me

This data package contains a subset of the open data published by the US Mining Health and Safety Administration (MSHA). It focuses primarily on data that is related to the production of coal, and thus also to the US electricity system. It was packaged for easy re-use by Catalyst Cooperative as part of the Public Utility Data Liberation (PUDL) project.

Data

This data package contains a collection of public data compiled and published by the US Mining Safety and Health Administration (MSHA), which is part of the US Department of Labor.

The data is primarily related to US mines, their operators, and their historical employment and production. It contains a subset of the information published by MSHA, selected because it is relevant to coal production and the US electricity system, including:

  • the Mines Data Set
  • the Mine Controller and Operator History
  • the Mine Employment and Production (Quarterly) Data Set

The original data can be downloaded directly from MSHA’s open data page.

Preparation

The data in this package has been minimally altered from the original version published by MSHA.

The following alterations have been made to the data:

  • Tabular files have been re-formatted to use commas to separate values rather than the pipe (|) character.
  • The character encoding of the data files has been converted from iso-8859-1 to utf-8.
  • Columns stored as strings and using Y and N to indicate Boolean True and False values have been converted to Boolean types, and now use the strings True and False.
  • Where appropriate, the types of some other columns have been changed from string to integer to reflect the nature of the data they contain.

The scripts and other inputs used to prepare the data package can be obtained from the PUDL repository on GitHub.

License

The data contained in this package is a US Government Work and is not subject to copyright within the US. The data package was created by Catalyst Cooperative as part of the Public Utility Data Liberation project and is released under a CC0-1.0 Public Domain Dedication. The software used in the compilation of the data package is released under the MIT License.

Datapackage.json