NYPD Motor Vehicle Collisions


Files Size Format Created Updated License Source
2 1GB csv zip 3 months ago John Snow Labs Standard License John Snow Labs City of New York

Data Files

File Description Size Last changed Download
nypd-motor-vehicle-collisions-csv 163MB csv (163MB) , json (852MB)
nypd-motor-vehicle-collisions_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 104MB zip (104MB)


This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
Date 1 date (%Y-%m-%d) Collision Date
Time 2 time (%H:%M:%S) Time Of Collision
Borough 3 string Borough
ZIP_Code 4 integer ZIP Code
Latitude 5 number Latitude Location
Longitude 6 number Latitude Location
On_Street_Name 7 string On Street Name
Cross_Street_Name 8 string Cross Street Name
Off_Street_Name 9 string Off Street Name
Number_Of_Persons_Injured 10 integer Number Of Persons Injured
Number_Of_Persons_Killed 11 integer Number_OfPersons Killed
Number_Of_Pedestrians_Injured 12 integer Number Of Pedestrians Injured
Number_Of_Pedestrians_Killed 13 integer Number Of Persons Killed
Number_Of_Cyclist_Injured 14 integer Number Of Cyclist Injured
Number_Of_Cyclist_Killed 15 integer Number Of Cyclist Killed
Number_Of_Motorist_Injured 16 integer Number Of Motorist Injured
Number_Of_Motorist_Killed 17 integer Number Of Motorist Killed
Contributing_Facto_Vehicle_1 18 string Contributing Facto Vehicle 1
Contributing_Facto_Vehicle_2 19 string Contributing Facto Vehicle 2
Contributing_Facto_Vehicle_3 20 string Contributing Facto Vehicle 3
Contributing_Facto_Vehicle_4 21 string Contributing Facto Vehicle 4
Contributing_Facto_Vehicle_5 22 string Contributing Facto Vehicle 5
Unique_Key 23 integer Unique Key
Vehicle_Type_Code_1 24 string Vehicle Type Code
Vehicle_Type_Code_2 25 string Vehicle Type Code 2
Vehicle_Type_Code_3 26 string Vehicle Type Code 3
Vehicle_Type_Code_4 27 string Vehicle Type Code 4
Vehicle_Type_Code_5 28 string Vehicle Type Code 5

Import into your tool

Data-cli or just data is the program to get and post your data with the datahub.
Use data with the datahub.io almost like you use git with the github. Here are installation instructions.

data get https://datahub.io/JohnSnowLabs/nypd-motor-vehicle-collisions
tree JohnSnowLabs/nypd-motor-vehicle-collisions
# Get a list of dataset's resources
curl -L -s https://datahub.io/JohnSnowLabs/nypd-motor-vehicle-collisions/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/JohnSnowLabs/nypd-motor-vehicle-collisions/r/0.csv

curl -L https://datahub.io/JohnSnowLabs/nypd-motor-vehicle-collisions/r/1.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")

json_file <- 'https://datahub.io/JohnSnowLabs/nypd-motor-vehicle-collisions/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/JohnSnowLabs/nypd-motor-vehicle-collisions/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/JohnSnowLabs/nypd-motor-vehicle-collisions/datapackage.json')

# print list of all resources:

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/JohnSnowLabs/nypd-motor-vehicle-collisions/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data