US House Price Index (Case-Shiller)

core

Files Size Format Created Updated License Source
2 588kB csv zip 2 months ago public_domain_dedication_and_license Standard and Poors Case-Shiller Indices
Case-Shiller Index of US residential house prices. Data comes from S&P Case-Shiller data and includes both the national index and the indices for 20 metropolitan regions. The indices are created using a repeat-sales methodology. Data As per the home page for Indices on S&P website: > The read more
Download

Data Files

File Description Size Last changed Download Other formats
cities [csv] Case-Shiller US home price index levels at national and city level. Monthly. 52kB cities [csv] cities [json] (52kB)
house-prices-us_zip [zip] Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 72kB house-prices-us_zip [zip]

cities  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
Date 1 date (%Y-%m-%d)
AZ-Phoenix 2 number
CA-Los Angeles 3 number
CA-San Diego 4 number
CA-San Francisco 5 number
CO-Denver 6 number
DC-Washington 7 number
FL-Miami 8 number
FL-Tampa 9 number
GA-Atlanta 10 number
IL-Chicago 11 number
MA-Boston 12 number
MI-Detroit 13 number
MN-Minneapolis 14 number
NC-Charlotte 15 number
NV-Las Vegas 16 number
NY-New York 17 number
OH-Cleveland 18 number
OR-Portland 19 number
TX-Dallas 20 number
WA-Seattle 21 number
Composite-10 22 number
Composite-20 23 number
National-US 24 number

house-prices-us_zip  

This is a preview version. There might be more data in the original version.

Read me

Case-Shiller Index of US residential house prices. Data comes from S&P Case-Shiller data and includes both the national index and the indices for 20 metropolitan regions. The indices are created using a repeat-sales methodology.

Data

As per the home page for Indices on S&P website:

The S&P/Case-Shiller U.S. National Home Price Index is a composite of single-family home price indices for the nine U.S. Census divisions and is calculated monthly. It is included in the S&P/Case-Shiller Home Price Index Series which seeks to measure changes in the total value of all existing single-family housing stock.

Documentation of the methodology can be found at: http://www.spindices.com/documents/methodologies/methodology-sp-cs-home-price-indices.pdf

Key points are (excerpted from methodology):

  • The indices use the “repeat sales method” of index calculation which uses data on properties that have sold at least twice, in order to capture the true appreciated value of each specific sales unit.
  • The quarterly S&P/Case-Shiller U.S. National Home Price Index aggregates nine quarterly U.S. Census division repeat sales indices using a base period a nd estimates of the aggregate value of single family housing stock for those periods.
  • The S&P/Case - Shiller Home Price Indices originated in the 1980s by Case Shiller Weiss’s research principals, Karl E. Case and Robert J. Shiller. At the time, Case and Shiller developed the repeat sales pricing technique. This methodology is recognized as the most reliable means to measure housing price movements and is used by other home price ind ex publishers, including the Office of Federal Housing Enterprise Oversight (OFHEO)

Preparation

To download and process the data do:

python scripts/process.py

Updated data files will then be in data directory.

Note: the URLs and structure of the source data have evolved over time with the source data URLs changing on every release.

Originally (2013) the site provided a table of links but these are not direct file URLs and you have dig around in S&P’s javascript to find the actual download locations. As of mid-2014 the data is consolidated in one primary XLS but the HTML you see in your browser and the source HTML are different. In addition, the actual location of the XLS file continues to change on each release.

License

Any rights of the maintainer are licensed under the PDDL. Exact legal status of source data (and hence of resulting processe data) is unclear but could have a presumption of public domain given its factual nature and US provenance. However, the current application of PDDL is indicative of maintainers best-guess (and comes with no warranty).

Import into your tool

In order to use Data Package in R follow instructions below:

install.packages("devtools")
library(devtools)
install_github("hadley/readr")
install_github("ropenscilabs/jsonvalidate")
install_github("ropenscilabs/datapkg")

#Load client
library(datapkg)

#Get Data Package
datapackage <- datapkg_read("https://pkgstore.datahub.io/core/house-prices-us/latest")

#Package info
print(datapackage)

#Open actual data in RStudio Viewer
View(datapackage$data$"cities")
View(datapackage$data$"house-prices-us_zip")

Tested with Python 3.5.2

To generate Pandas data frames based on JSON Table Schema descriptors we have to install jsontableschema-pandas plugin. To load resources from a data package as Pandas data frames use datapackage.push_datapackage function. Storage works as a container for Pandas data frames.

In order to work with Data Packages in Pandas you need to install our packages:

$ pip install datapackage
$ pip install jsontableschema-pandas

To get Data Package run following code:

import datapackage

data_url = "https://pkgstore.datahub.io/core/house-prices-us/latest/datapackage.json"

# to load Data Package into storage
storage = datapackage.push_datapackage(data_url, 'pandas')

# to see datasets in this package
storage.buckets

# you can access datasets inside storage, e.g. the first one:
storage[storage.buckets[0]]

In order to work with Data Packages in Python you need to install our packages:

$ pip install datapackage

To get Data Package into your Python environment, run following code:

import datapackage

dp = datapackage.DataPackage('https://pkgstore.datahub.io/core/house-prices-us/latest/datapackage.json')

# see metadata
print(dp.descriptor)

# get list of csv files
csvList = [dp.resources[x].descriptor['name'] for x in range(0,len(dp.resources))]
print(csvList) # ["resource name", ...]

# access csv file by the index starting 0
print(dp.resources[0].data)

To use this dataset in JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use code snippet below:

  const {Dataset} = require('data.js')

  const path = 'https://pkgstore.datahub.io/core/house-prices-us/latest/datapackage.json'

  const dataset = Dataset.load(path)

  // get a data file in this dataset
  const file = dataset.resources[0]
  const data = file.stream()

In order to work with Data Packages in SQL you need to install our packages:

$ pip install datapackage
$ pip install jsontableschema-sql
$ pip install sqlalchemy

To import Data Package to your SQLite Database, run following code:

import datapackage
from sqlalchemy import create_engine

data_url = 'https://pkgstore.datahub.io/core/house-prices-us/latest/datapackage.json'
engine = create_engine('sqlite:///:memory:')

# to load Data Package into storage
storage = datapackage.push_datapackage(data_url, 'sql', engine=engine)

# to see datasets in this package
storage.buckets

# to execute sql command (assuming data is in "data" folder, name of resource is data and file name is data.csv)
storage._Storage__connection.execute('select * from data__data___data limit 1;').fetchall()

# description of the table columns
storage.describe('data__data___data')
Datapackage.json