The TRC2 corpus comprises 1,800,370 news stories covering the period from 2008-01-01 00:00:03 to 2009-02-28 23:54:14 or 2,871,075,221 bytes, and was initially made available to participants of the 2009 blog track at the Text Retrieval Conference (TREC), to supplement the BLOGS08 corpus (that contains results of a large blog crawl carried out at the University of Glasgow). TRC2 is distributed via web download.
The stories in the Reuters Corpus are under the copyright of Reuters Ltd and/or Thompson Reuters, and their use is governed by the following agreements:
This agreement must be signed by the person responsible for the data at your organization, and sent to NIST.
This agreement must be signed by all researchers using the Reuters Corpus at your organization, and kept on file at your organization.
Getting the corpus
Download and print the Organizational and Individual agreement forms above.
Send the Organizational form to NIST by one of the methods listed below:
Send a scanned pdf file
Complete the Reuters Organizational form and send a pdf file of the form to: email@example.com
In your email include the following:
Subject: request for Reuters corpus
In the body of message include: your name, your complete postal address, and if you are requesting RCV1, RCV2, TRC2 or all three.
(do not include other correspondence in this message)
Complete and keep the individual agreement form on file at your organization.
Subject to our approval, you will receive (in the case of RCV1 and 2) the corpus CDs by mail, and/or (in the case of TRC2) a download URL, login, and password via email.
Please allow seven business days for a response.
If you have already obtained some of the Reuters corpora, and wish to obtain others, send email to firstname.lastname@example.org. Please provide the name of your organization, the month/year you requested RCV1/2/TRC2, and the corpus you are interested in receiving. An Organizational agreement must be on file at NIST.
Thomson Reuters Text Research Collection (TRC2). No author.
Retrieved 17:13, May 24, 2013 (UTC).
the Data Hub