Files | Size | Format | Created | Updated | License | Source |
---|---|---|---|---|---|---|
2 | 71kB | csv zip | 5 years ago | 5 years ago | ODC-PDDL-1.0 |
Download files in this dataset
File | Description | Size | Last changed | Download |
---|---|---|---|---|
market-share | 4kB | csv (4kB) , json (11kB) | ||
search-engine-market-shares_zip | Compressed versions of dataset. Includes normalized CSV and JSON data with original data and datapackage.json. | 8kB | zip (8kB) |
This is a preview version. There might be more data in the original version.
Field Name | Order | Type (Format) | Description |
---|---|---|---|
Date | 1 | date (%Y-%m-%d) | |
Country | 2 | string | |
AOL | 3 | number | |
All the Web | 4 | number | |
AltaVista | 5 | number | |
Ask | 6 | number | |
Excite | 7 | number | |
8 | number | ||
Lycos | 9 | number | |
MSN | 10 | number | |
Yahoo | 11 | number | |
Others | 12 | number | |
Source | 13 | string |
Use our data-cli tool designed for data wranglers:
data get https://datahub.io/rufuspollock/search-engine-market-shares
data info rufuspollock/search-engine-market-shares
tree rufuspollock/search-engine-market-shares
# Get a list of dataset's resources
curl -L -s https://datahub.io/rufuspollock/search-engine-market-shares/datapackage.json | grep path
# Get resources
curl -L https://datahub.io/rufuspollock/search-engine-market-shares/r/0.csv
curl -L https://datahub.io/rufuspollock/search-engine-market-shares/r/1.zip
If you are using R here's how to get the data you want quickly loaded:
install.packages("jsonlite", repos="https://cran.rstudio.com/")
library("jsonlite")
json_file <- 'https://datahub.io/rufuspollock/search-engine-market-shares/datapackage.json'
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
# get list of all resources:
print(json_data$resources$name)
# print all tabular data(if exists any)
for(i in 1:length(json_data$resources$datahub$type)){
if(json_data$resources$datahub$type[i]=='derived/csv'){
path_to_file = json_data$resources$path[i]
data <- read.csv(url(path_to_file))
print(data)
}
}
Note: You might need to run the script with root permissions if you are running on Linux machine
Install the Frictionless Data data package library and the pandas itself:
pip install datapackage
pip install pandas
Now you can use the datapackage in the Pandas:
import datapackage
import pandas as pd
data_url = 'https://datahub.io/rufuspollock/search-engine-market-shares/datapackage.json'
# to load Data Package into storage
package = datapackage.Package(data_url)
# to load only tabular data
resources = package.resources
for resource in resources:
if resource.tabular:
data = pd.read_csv(resource.descriptor['path'])
print (data)
For Python, first install the `datapackage` library (all the datasets on DataHub are Data Packages):
pip install datapackage
To get Data Package into your Python environment, run following code:
from datapackage import Package
package = Package('https://datahub.io/rufuspollock/search-engine-market-shares/datapackage.json')
# print list of all resources:
print(package.resource_names)
# print processed tabular data (if exists any)
for resource in package.resources:
if resource.descriptor['datahub']['type'] == 'derived/csv':
print(resource.read())
If you are using JavaScript, please, follow instructions below:
Install data.js
module using npm
:
$ npm install data.js
Once the package is installed, use the following code snippet:
const {Dataset} = require('data.js')
const path = 'https://datahub.io/rufuspollock/search-engine-market-shares/datapackage.json'
// We're using self-invoking function here as we want to use async-await syntax:
;(async () => {
const dataset = await Dataset.load(path)
// get list of all resources:
for (const id in dataset.resources) {
console.log(dataset.resources[id]._descriptor.name)
}
// get all tabular data(if exists any)
for (const id in dataset.resources) {
if (dataset.resources[id]._descriptor.format === "csv") {
const file = dataset.resources[id]
// Get a raw stream
const stream = await file.stream()
// entire file as a buffer (be careful with large files!)
const buffer = await file.buffer
// print data
stream.pipe(process.stdout)
}
}
})()
Search engine market shares around the world over time. Data sourced from NetApplications, WebSideStory, Nielsen’s NetRatings, comScore’s MediaMetrix and other sources. Some figures and data come from Is Google the next Microsoft? Competition, Welfare and Regulation in Online Search, Review of Network Economics, Rufus Pollock, December 2010. Please cite that as well as this dataset when using this data.
Obtaining good (comparable) market share data over a reasonable period is difficult. In particular, in the late 90s and early 2000s the only information recorded was the number of visits to a particular website. Since many providers of search also ran `portals’ it can be difficult to distinguish pure search from simple visits.
In addition, early data frequently only records the number of unique visitors a month rather than giving a breakdown of the number of hits and this can severely distort results since pure-search providers (such as Google) are much more likely to have multiple visits from the same user than more portal-like sites.
Matters are further complicated by the fact that in the late 1990s and early 2000s many search \emph{sites} had their search powered by a third-party provider. For example, up until 2004, Yahoo! did not have their own search engine but `bought-in’ results, first from Inktomi (up until 2000) and then Google.
Figure shows combined data from NetApplications and WebSideStory (now part of Omniture). Both firms source their data from web analytic applications installed on customers’ sites and NetApplications appears to be more global in its customer-base than WebSideStory (which may partially explain the non-exact match between the two datasets apparent in the 2004 values).
Note these sources of data differs from that found in the likes of Nielsen’s NetRatings, comScore’s MediaMetrix. Those products get their data from the users themselves (directly or indirectly via ISPs) rather than from websites they visit. In this sense they may be more reliable sources of data. However, it has proved difficult to obtain continuous time-series data for these providers for more than a couple of years – and for that period the trend they show is very similar to that found in the data shown.
The graph shows a simple story: a single firm (Google) emerges to dominate the market. In terms of general concentration, it is noteworthy that even in 2002, when Google was not yet as dominant as it is today, the top two firms (Google and Yahoo!) accounted for over 70% of the market while adding in Microsoft pushes this up to close to 90% (and of course at that point Yahoo!'s search was being powered by Google and MSN’s by LookSmart and Inktomi).
http://searchenginewatch.com/showPage.html?page=3334881
Gives graph with USA (World?) data from WebSideStory from which we can infer data. Data not used:
Outside US: Country Google Yahoo Germany 80.5% 5.6% UK 65.6% 10.8% China 72.6% 12.7%
See netapplications.py file in code files.
Global data from hitslink.com
From Site Front Page:
About Our Market Share Statistics
This data provides valuable insight into significant trends for internet usage. These statistics include monthly information on key statistics such as browser trends (e.g. Internet Explorer vs. Firefox market share), search engine referral data (e.g. Yahoo vs. MSN vs. Google traffic market share) and operating system share (Windows vs. Mac vs. Linux market share or even the iPhone market share vs. Windows Mobile).
We use a unique methodology for collecting this data. We collect data from the browsers of site visitors to our exclusive on-demand network of live stats customers. The data is compiled from approximately 160 million visitors per month. The information published is an aggregate of the data from this network of hosted website statistics. The site unique visitor and referral information is summarized on a monthly basis.
In addition, we classify 430+ referral sources identified as search engines. Aggregate traffic referrals from these engines are summarized and reported monthly. The statistics for search engines include both organic and sponsored referrals. The websites in our population represent dozens of countries in regions including North America, South America, Western Europe, Australia / Pacific Rim and Parts of Asia.
The data is made available free of charge on a monthly basis that includes monthly browser market share trends, top search engine referrals, screen resolutions, top ISPs and operating systems trends. An upgraded version is available that provides reports by geolocation, preview weekly data and other features.