Tennis Time

five-thirty-eight

Files Size Format Created Updated License Source
4 88kB csv zip 4 years ago 4 years ago FiveThirtyEight - Tennis Time
Tennis Time This folder contains data behind the story Why Some Tennis Matches Take Forever. serve_times.csv Header | Definition --- | --- server | Name of player serving at 2015 French Open secondsbeforenext_point | Time in seconds between end of marked point and next serve, timed by stopwatch read more
Download Developers

Data Files

Download files in this dataset

File Description Size Last changed Download
events_time 6kB csv (6kB) , json (20kB)
players_time 5kB csv (5kB) , json (13kB)
serve_times 6kB csv (6kB) , json (19kB)
tennis-time_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 22kB zip (22kB)

events_time  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
tournament 1 string (default)
surface 2 string (default)
seconds_added_per_point 3 number (default)
years 4 string (default)

players_time  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
player 1 string (default)
seconds_added_per_point 2 number (default)

serve_times  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
server 1 string (default)
seconds_before_next_point 2 integer (default)
day 3 string (default)
opponent 4 string (default)
game_score 5 string (default)
set 6 integer (default)
game 7 string (default)

Integrate this dataset into your favourite tool

Use our data-cli tool designed for data wranglers:

data get https://datahub.io/five-thirty-eight/tennis-time
data info five-thirty-eight/tennis-time
tree five-thirty-eight/tennis-time
# Get a list of dataset's resources
curl -L -s https://datahub.io/five-thirty-eight/tennis-time/datapackage.json | grep path

# Get resources

curl -L https://datahub.io/five-thirty-eight/tennis-time/r/0.csv

curl -L https://datahub.io/five-thirty-eight/tennis-time/r/1.csv

curl -L https://datahub.io/five-thirty-eight/tennis-time/r/2.csv

curl -L https://datahub.io/five-thirty-eight/tennis-time/r/3.zip

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")

json_file <- 'https://datahub.io/five-thirty-eight/tennis-time/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:
print(json_data$resources$name)

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
  if(json_data$resources$datahub$type[i]=='derived/csv'){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))
    print(data)
  }
}

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = 'https://datahub.io/five-thirty-eight/tennis-time/datapackage.json'

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('https://datahub.io/five-thirty-eight/tennis-time/datapackage.json')

# print list of all resources:
print(package.resource_names)

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':
        print(resource.read())

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'https://datahub.io/five-thirty-eight/tennis-time/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
    console.log(dataset.resources[id]._descriptor.name)
  }
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await file.stream()
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data
      stream.pipe(process.stdout)
    }
  }
})()

Read me

Tennis Time

This folder contains data behind the story Why Some Tennis Matches Take Forever.

serve_times.csv

Header Definition
server Name of player serving at 2015 French Open
seconds_before_next_point Time in seconds between end of marked point and next serve, timed by stopwatch app
day Date
opponent Opponent, receiving serve
game_score Score in the current game during the timed interval between points
set Set number, out of five
game Score in games within the set

players_time.csv

Header Definition
player Player name
seconds_added_per_point Weighted average of seconds added per point as loser and winner of matches, 1991-2015, from regression model controlling for tournament, surface, year and other factors

events_time.csv

Header Definition
tournament Name of event
surface Court surface used at the event
seconds_added_per_point Seconds added per point for this event on this surface in years shown, from regression model controlling for players, year and other factors
years Start and end years for data used from this tournament in regression

This dataset was scraped from FiveThirtyEight - tennis-time

Datapackage.json