A collection of datasets about Wikipedia and other projects run by the Wikimedia Foundation. The collection is open to contributions by researchers not affiliated with the Foundation.
Our overall data policy is to release into the public domain all datasets that don't require attribution and to license datasets that include textual/media contributions from Wikimedians under the appropriate open license, most commonly a CC BY 3.0 license.
Datasety
17 datasetů se podařilo najít.
-
Description From the front page: DBpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you...
-
dbpedia lite takes some of the structured data in Wikipedia and presents it as Linked Data. It contains a small subset of the data that dbpedia contains; it does not attempt to extract...
-
DBpedia.org is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated...
-
This dataset comprises the full, anonymized set of responses from the blind assessment of a sample of Wikipedia articles across languages and disciplines by academic experts. The study...
-
This is a non-random dataset containing the edit histories of about 47,000 editors. This can be used for machine learning purposes and the outcome variable is the number of edits six...
-
Public data about the Wikimedia Fundraiser. Data is refreshed every 15 minutes and includes the complete historical series since 2006.
-
A curated corpus of references on Wikipedia and Wikimedia research, reviewed in the monthly Wikimedia Research Newsletter.
-
Wikipedia dumps of full content of wikipedia. Database backup dumps - A complete copy of all Wikimedia wikis, in the form of wikitext source and metadata embedded in XML. A number of...
-
A complete anonymized dump of 11M article ratings collected over 1 year (July 2011 - July 2012) from the English Wikipedia. Read more...
-
This file has one row for each banner. For a full file layout, see http://blog.allourideas.org/post/2739358388/download-your-data.
-
This file has one row for each non-vote (e.g., a voter clicking "I can't decide"). For full file layout details, see: http://blog.allourideas.org/post/2739358388/download-your-data
-
This file has one row for each vote. For a more detailed file layout, see http://blog.allourideas.org/post/2739358388/download-your-data
-
This experiment looks at the effects of linking to the revision history of Wikipedia articles with a prominent "last modified" timestamp. Currently, the only way for readers to discover...
-
Hourly registrations of new user accounts to the English Wikipedia.
-
This dataset shows the top 60 Wikipedia templates that editors, both new and experienced, receive on their Talk pages. The dataset covers the period 2007 - 2011.
-
Data on user preferences set by active Wikipedia editors. Active editors are defined as registered users with at least 5 edits per month in a given project. The dumps were generated on...
-
This is real, accurate hourly snapshot data on the access to Wikipedia captured from the Wikimedia Squid servers. Project counts show the total access in a time period to the different...