27 conjuntos de dados encontrados

Etiquetas: corpus

  • Cancer Registry

    Statistics of Cancer Registry from www.statcentral.ie under the theme People and Society - Health from the The National Cancer Registry Classifications: Cancer type...
  • News-100 NIF NER Corpus

    This corpus comprises 100 German news articles from the online news platform news.de. All of the articles were published in the year of 2010 and contain the word Golf. This word...
  • RSS-500 NIF NER CORPUS

    This corpus has been created using a dataset comprising a list of 1,457 RSS feeds as compiled in (Goldhahn et al. 2012). The list includes all major worldwide newspapers...
  • KORE 50 NIF NER Corpus

    KORE 501 is a subset of the larger AIDA corpus, which is based on the dataset of the CoNLL 2003 NER task. The dataset aims to capture hard to disambiguate mentions of...
  • Reuters-128 NIF NER Corpus

    This English corpus is based on the well known Reuters-21578 corpus which contains economic news articles. In particular, we chose 128 articles containing at least one NE....
  • Brown Corpus in RDF/NIF

    RDF version of the Brown Corpus (W. N. Francis, H. Kucera; Brown University; 1979). 1,014,312 words in 500 documents, taken from newspapers texts on diverse topics, non-fiction...
  • TalkBank

    About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of...
  • Syntactic Reference Corpus of Medieval French (SRCMF)

    The SRCMF contains the 15 Old French texts with about 280000 words. It has a high-quality manual annotation, based on a linguistically adequate dependency grammar. Annotation...
  • OPUS - an open source parallel corpus

    OPUS is a growing collection of translated texts from the web. In the OPUS project we try to convert and align free online data, to add linguistic annotation, and to provide the...
  • Ontologies of Linguistic Annotations (OLiA) Popular

    The Ontologies of Linguistic Annotations (OLiA) provide an OWL/DL taxonomy of data categories as a reference for linguistic annotation (OLiA Reference Model), plus OWL/DL models...
  • KAIST silver standard corpus

    KAIST silver standard corpus Availability: Freely Avalable Usage: Named Entity Recognition Status:Newly created-finished Description: We propose a novel method to...
  • French TimeBank

    The French TimeBank consists of a set of 109 journalistic articles from 7 different sub-genres annotated according to the ISO-TimeML standard, adapted for the French language....
  • EU Directorate-General for Translation (DGT) - Acquis Communautaire

    About From website: As of November 2007, the European Commission's Directorate-General for Translation (DGT) made publicly accessible its multilingual Translation Memory for...
  • Diachronic Ontologies from People's Daily

    Diachronic Ontologies from People's Daily Ontology Availability: Freely Avalable Usage: Word Sense Disambiguation Status:Newly created-finished Description: 1....
  • Manually Annotated Sub-Corpus (MASC) of the Open American National Corpus

    The Manually Annotated Sub-Corpus (MASC) consists of approximately 500,000 words of contemporary American English written and spoken data drawn from the OPEN AMERICAN NATIONAL...
  • Atlante Sintattico d'Italia (ASIt)

    The Atlante Sintattico d'Italia, Syntactic Atlas of Italy (ASIt) enterprise builds on a long standing tradition of collecting and analysing linguistic corpora, which has...
  • VoxForge

    About VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac). We will make available all...
  • Open-Content Text Corpus

    The OCTC hosts open-content texts, encoded in TEI P5 XML, for many languages, each in a separate subcorpus. Another part of the OCTC stores interlanguage alignment info. The...
  • The New York Times Annotated Corpus

    About From website: The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007...
  • MOCHA-TIMIT

    About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999....
Você também pode ter acesso a esses registros usando a API (veja Documentação da API).