S&P 500 Companies with Financial Information

core

Files Size Format Created Updated License Source
3 530kB csv zip 1 week ago PDDL-1.0
List of companies in the S&P 500 (Standard and Poor's 500). The S&P 500 is a free-float, capitalization-weighted index of the top 500 publicly listed stocks in the US (top 500 by market cap). The dataset includes a list of all the stocks contained therein and associated key financials such as read more
Download

Data Files

File Description Size Last changed Download Other formats
constituents [csv] 19kB constituents [csv] constituents [json] (37kB)
constituents-financials [csv] 82kB constituents-financials [csv] constituents-financials [json] (193kB)
datapackage_zip [zip] Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. 100kB datapackage_zip [zip]

constituents  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
Symbol 1 string
Name 2 string
Sector 3 string

constituents-financials  

This is a preview version. There might be more data in the original version.

Field information

Field Name Order Type (Format) Description
Symbol 1 string
Name 2 string
Sector 3 string
Price 4 number
Dividend Yield 5 number
Price/Earnings 6 number
Earnings/Share 7 number
Book Value 8 number
52 week low 9 number
52 week high 10 number
Market Cap 11 number
EBITDA 12 number
Price/Sales 13 number
Price/Book 14 number
SEC Filings 15 string (url)

datapackage_zip  

This is a preview version. There might be more data in the original version.

Read me

List of companies in the S&P 500 (Standard and Poor’s 500). The S&P 500 is a free-float, capitalization-weighted index of the top 500 publicly listed stocks in the US (top 500 by market cap). The dataset includes a list of all the stocks contained therein and associated key financials such as price, market capitalization, earnings, price/earnings ratio, price to book etc.

Data

Information on S&P 500 index used to be available on the official webpage on the Standard and Poor’s website but until they publish it back, Wikipedia is the best up-to-date and open data source.

  • Index listing - see <data/constituents.csv> extracted from wikipedia’s SP500 list of companies
  • Constituent financials - see <data/constituents-financials.csv> (source via Yahoo Finance)

Detailed information on the S&P 500 (primarily in xls format) used to be obtained from its official webpage on the Standard and Poor’s website - it was free but registration was required.

  • Index listing - see <data/constituents.csv> used to be extracted from source Excel file on S&P website (Note this Excel is actually S&P 500 EPS estimates but on sheet 4 it has list of members - [previous file][sp-lsting] was just members but that 404s as of Dec 2014) (Note: but note you have to register and login to access - no longer true as of August 2013)
  • Historical performance (source xls on S&P website)

Notes:

  • Market Capitalization and EBIDTA are in Billions

Note: for aggregate information on the S&P (dividends, earnings etc) see Standard and Poor’s 500 Dataset

Preparation

You can run the script yourself to update the data and publish them to github : see scripts README

General Financial Notes

Publicly listed US companies are obliged various reports on a regular basis with the SEC. Of these 2 types are of especial interest to investors and others interested in their finances and business. These are:

  • 10-K = Annual Report
  • 10-Q = Quarterly report

License

All data is licensed under the Open Data Commons Public Domain Dedication and License. All code is licensed under the MIT/BSD license.

Note that while no credit is formally required a link back or credit to Rufus Pollock and the Open Knowledge Foundation is much appreciated.

Import into your tool

If you are using R here's how to get the data you want quickly loaded:

install.packages("jsonlite")
library("jsonlite")

json_file <- "http://datahub.io/core/s-and-p-500-companies/datapackage.json"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))

# access csv file by the index starting from 1
path_to_file = json_data$resources[[1]]$path
data <- read.csv(url(path_to_file))
print(data)

In order to work with Data Packages in Pandas you need to install the Frictionless Data data package library and the pandas extension:

pip install datapackage
pip install jsontableschema-pandas

To get the data run following code:

import datapackage

data_url = "http://datahub.io/core/s-and-p-500-companies/datapackage.json"

# to load Data Package into storage
storage = datapackage.push_datapackage(data_url, 'pandas')

# data frames available (corresponding to data files in original dataset)
storage.buckets

# you can access datasets inside storage, e.g. the first one:
storage[storage.buckets[0]]

For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):

pip install datapackage

To get Data Package into your Python environment, run following code:

from datapackage import Package

package = Package('http://datahub.io/core/s-and-p-500-companies/datapackage.json')

# get list of resources:
resources = package.descriptor['resources']
resourceList = [resources[x]['name'] for x in range(0, len(resources))]
print(resourceList)

data = package.resources[0].read()
print(data)

If you are using JavaScript, please, follow instructions below:

Install data.js module using npm:

  $ npm install data.js

Once the package is installed, use the following code snippet:

const {Dataset} = require('data.js')

const path = 'http://datahub.io/core/s-and-p-500-companies/datapackage.json'

// We're using self-invoking function here as we want to use async-await syntax:
(async () => {
  const dataset = await Dataset.load(path)

  // Get the first data file in this dataset
  const file = dataset.resources[0]
  // Get a raw stream
  const stream = await file.stream()
  // entire file as a buffer (be careful with large files!)
  const buffer = await file.buffer
})()

Install the datapackage library created specially for Ruby language using gem:

gem install datapackage

Now get the dataset and read the data:

require 'datapackage'

path = 'http://datahub.io/core/s-and-p-500-companies/datapackage.json'

package = DataPackage::Package.new(path)
# So package variable contains metadata. You can see it:
puts package

# Read data itself:
resource = package.resources[0]
data = resource.read
puts data
Datapackage.json