Etiqueta: ir

Hay 5 conjuntos de datos etiquetados con ir:

  • This is a publicly available, tokenized version of the Reuters RCV1 corpus by David D Lewis et al. The creator requests attribution.
  • Reuters Corpus Volume 2 (RCV2)
    • 779 views
    • None Sin licencia abierta
    Reuters Corpus, Volume 2, Multilingual Corpus, 1996-08-20 to 1997-08-19 (Release date 2005-05-31, Format version 1, correction level 0) This is distributed on one CD and contains over...
  • Reuters Corpus Volume 1 (RCV1)
    • 133 views
    • None Sin licencia abierta
    RCV1 is a dataset of 810,000 documents (2.5GB uncompressed), which is available by request from the NIST. Those documents are distributed by CD. For derivative data that is publicly...
  • Reuters-21578
    • 33 views
    • None Sin licencia abierta
    A set of documents from Reuters' 1986 newswire which have been classified. This dataset is appropriate for testing natural language processing and information retrieval algorithms....
  • The ClueWeb09 Dataset
    • 41 views
    • None Sin licencia abierta
    The ClueWeb09 dataset was created to support research on information retrieval and related human language technologies. It consists of about 1 billion web pages in ten languages that were...