Mammography Data from Breast Cancer Surveillance Consortium

JohnSnowLabs

Files Size Format Created Updated License Source
2 42MB csv zip 4 months ago John Snow Labs Standard License John Snow Labs Breast Cancer Surveillance Consortium
Download

Data Files

File Description Size Last changed Download
mammography-data-from-breast-cancer-surveillance-consortium-csv 5MB csv (5MB) , json (29MB)
mammography-data-from-breast-cancer-surveillance-consortium_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 1MB zip (1MB)

mammography-data-from-breast-cancer-surveillance-consortium-csv  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
Age_At_The_Time_Of_Mammography 1 number Patient's age in years at time of mammogram
Radiologists_Assessment_Based_On_The_BI_RADS_Scale 2 string Radiologist's assessment based on the BI-RADS scale
Binary_Indicator_Of_Cancer_Diagnosis_In_1_Year_Of_Screening_Mammogram 3 string Binary indicator of cancer diagnosis within one year of screening mammogram
Comparison_Mammogram_From_Prior_Mammography_Available 4 string Comparison mammogram from prior mammography examination available
Patients_BI_RADS_Breast_Density_At_Time_Of_Mammogram 5 string Patient's BI-RADS breast density as recorded at time of mammogram
Family_History_Of_Breast_Cancer_In_A_First_Degree_Relative 6 string Family history of breast cancer in a first degree relative
Current_Use_Of_Hormone_Therapy_At_Time_Of_Mammogram 7 string Current use of hormone therapy at time of mammogram
Binary_Indicator_Whether_Woman_Had_Ever_Received_Prior_Mammogram 8 string Binary indicator of whether the woman had ever received a prior mammogram
History_Of_Breast_Biopsy 9 string Prior history of breast biopsy
Film_Or_Digital_Mammogram 10 string Film or digital mammogram
Cancer_Type 11 string
Body_Mass_Index_At_Time_Of_Mammogram 12 string Body mass index at time of mammogram
Patients_Study_ID 13 number Identification of Patient

Import into your tool

Data-cli or just data is the program to get and post your data with the datahub.
Download CLI tool and use it with the datahub almost like you use git with the github:

data get https://datahub.io/JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium
data info JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium
tree JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium
# Get a list of dataset's resources
curl -L -s https://datahub.io/JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium/r/0.csv

curl -L https://datahub.io/JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium/r/1.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")

json_file <- 'https://datahub.io/JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    print(data)
  }
}

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium/datapackage.json')

# print list of all resources:
print(package.resource_names)

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        print(resource.read())

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/JohnSnowLabs/mammography-data-from-breast-cancer-surveillance-consortium/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
    console.log(dataset.resources[id]._descriptor.name)
  }
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data
      stream.pipe(process.stdout)
    }
  }
})()
Datapackage.json