US House Price Index (Case-Shiller)


Files Size Format Created Updated License Source
2 129kB csv zip 5 months ago 1 month ago public_domain_dedication_and_license Standard and Poors Case-Shiller Indices
Case-Shiller Index of US residential house prices. Data comes from S&P Case-Shiller data and includes both the national index and the indices for 20 metropolitan regions. The indices are created using a repeat-sales methodology. Data As per the home page for Indices on S&P website: > The read more

Data Files

File Description Size Last changed Download
cities Case-Shiller US home price index levels at national and city level. Monthly. 52kB csv (52kB) , json (183kB)
house-prices-us_zip Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 74kB zip (74kB)


This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
Date 1 date (%Y-%m-%d)
AZ-Phoenix 2 number
CA-Los Angeles 3 number
CA-San Diego 4 number
CA-San Francisco 5 number
CO-Denver 6 number
DC-Washington 7 number
FL-Miami 8 number
FL-Tampa 9 number
GA-Atlanta 10 number
IL-Chicago 11 number
MA-Boston 12 number
MI-Detroit 13 number
MN-Minneapolis 14 number
NC-Charlotte 15 number
NV-Las Vegas 16 number
NY-New York 17 number
OH-Cleveland 18 number
OR-Portland 19 number
TX-Dallas 20 number
WA-Seattle 21 number
Composite-10 22 number
Composite-20 23 number
National-US 24 number

Read me

Case-Shiller Index of US residential house prices. Data comes from S&P Case-Shiller data and includes both the national index and the indices for 20 metropolitan regions. The indices are created using a repeat-sales methodology.


As per the home page for Indices on S&P website:

The S&P/Case-Shiller U.S. National Home Price Index is a composite of single-family home price indices for the nine U.S. Census divisions and is calculated monthly. It is included in the S&P/Case-Shiller Home Price Index Series which seeks to measure changes in the total value of all existing single-family housing stock.

Documentation of the methodology can be found at:

Key points are (excerpted from methodology):

  • The indices use the “repeat sales method” of index calculation which uses data on properties that have sold at least twice, in order to capture the true appreciated value of each specific sales unit.
  • The quarterly S&P/Case-Shiller U.S. National Home Price Index aggregates nine quarterly U.S. Census division repeat sales indices using a base period a nd estimates of the aggregate value of single family housing stock for those periods.
  • The S&P/Case - Shiller Home Price Indices originated in the 1980s by Case Shiller Weiss’s research principals, Karl E. Case and Robert J. Shiller. At the time, Case and Shiller developed the repeat sales pricing technique. This methodology is recognized as the most reliable means to measure housing price movements and is used by other home price ind ex publishers, including the Office of Federal Housing Enterprise Oversight (OFHEO)


To download and process the data do:

python scripts/

Updated data files will then be in data directory.

Note: the URLs and structure of the source data have evolved over time with the source data URLs changing on every release.

Originally (2013) the site provided a table of links but these are not direct file URLs and you have dig around in S&P’s javascript to find the actual download locations. As of mid-2014 the data is consolidated in one primary XLS but the HTML you see in your browser and the source HTML are different. In addition, the actual location of the XLS file continues to change on each release.


Any rights of the maintainer are licensed under the PDDL. Exact legal status of source data (and hence of resulting processe data) is unclear but could have a presumption of public domain given its factual nature and US provenance. However, the current application of PDDL is indicative of maintainers best-guess (and comes with no warranty).

Import into your tool

Data-cli or just data is the program to get and post your data with the datahub.
Use data with the almost like you use git with the github. Here are installation instructions.

data get
tree core/house-prices-us
# Get a list of dataset's resources
curl -L -s | grep path

# Get resources

curl -L

curl -L

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite", repos="")

json_file <- ''
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# get list of all resources:

# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
    path_to_file = json_data$resources$path[i]
    data <- read.csv(url(path_to_file))

Note: You might need to run the script with root permissions if you are running on Linux machine

Install the Frictionless Data data package library and the pandas itself:

pip install datapackage
pip install pandas

Now you can use the datapackage in the Pandas:

import datapackage
import pandas as pd

data_url = ''

# to load Data Package into storage
package = datapackage.Package(data_url)

# to load only tabular data
resources = package.resources
for resource in resources:
    if resource.tabular:
        data = pd.read_csv(resource.descriptor['path'])
        print (data)

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('')

# print list of all resources:

# print processed tabular data (if exists any)
for resource in package.resources:
    if resource.descriptor['datahub']['type'] == 'derived/csv':

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = ''

// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
  const dataset = await Dataset.load(path)
  // get list of all resources:
  for (const id in dataset.resources) {
  // get all tabular data(if exists any)
  for (const id in dataset.resources) {
    if (dataset.resources[id]._descriptor.format === "csv") {
      const file = dataset.resources[id]
      // Get a raw stream
      const stream = await
      // entire file as a buffer (be careful with large files!)
      const buffer = await file.buffer
      // print data