21 conjuntos de datos encontrados

Etiquetas: corpus

  • TalkBank

    About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of...
  • Syntactic Reference Corpus of Medieval French (SRCMF)

    The SRCMF contains the 15 Old French texts with about 280000 words. It has a high-quality manual annotation, based on a linguistically adequate dependency grammar. Annotation...
  • OPUS - an open source parallel corpus

    OPUS is a growing collection of translated texts from the web. In the OPUS project we try to convert and align free online data, to add linguistic annotation, and to provide the...
  • Ontologies of Linguistic Annotations (OLiA) Popular

    The Ontologies of Linguistic Annotations (OLiA) provide an OWL/DL taxonomy of data categories as a reference for linguistic annotation (OLiA Reference Model), plus OWL/DL models...
  • KAIST silver standard corpus

    KAIST silver standard corpus Availability: Freely Avalable Usage: Named Entity Recognition Status:Newly created-finished Description: We propose a novel method to...
  • French TimeBank

    The French TimeBank consists of a set of 109 journalistic articles from 7 different sub-genres annotated according to the ISO-TimeML standard, adapted for the French language....
  • EU Directorate-General for Translation (DGT) - Acquis Communautaire

    About From website: As of November 2007, the European Commission's Directorate-General for Translation (DGT) made publicly accessible its multilingual Translation Memory for...
  • Diachronic Ontologies from People's Daily

    Diachronic Ontologies from People's Daily Ontology Availability: Freely Avalable Usage: Word Sense Disambiguation Status:Newly created-finished Description: 1....
  • Manually Annotated Sub-Corpus (MASC) of the Open American National Corpus

    The Manually Annotated Sub-Corpus (MASC) consists of approximately 500,000 words of contemporary American English written and spoken data drawn from the OPEN AMERICAN NATIONAL...
  • Atlante Sintattico d'Italia (ASIt)

    The Atlante Sintattico d'Italia, Syntactic Atlas of Italy (ASIt) enterprise builds on a long standing tradition of collecting and analysing linguistic corpora, which has...
  • VoxForge

    About VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac). We will make available all...
  • Open-Content Text Corpus

    The OCTC hosts open-content texts, encoded in TEI P5 XML, for many languages, each in a separate subcorpus. Another part of the OCTC stores interlanguage alignment info. The...
  • The New York Times Annotated Corpus

    About From website: The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007...
  • MOCHA-TIMIT

    About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999....
  • InAra Plagiarism Detection Corpus

    ARAbic INtrinsic plagiarism detection corpus (InAra Corpus 2013) InAra corpus it the first corpus for the evaluation of Arabic Intrinsic plagiarism detection. The Intrinsic...
  • The IBL Corpus

    About The IBL Corpus was collected by the University of Plymouth and the University of Edinburgh as part of the EPSRC funded project IBL, Instruction-based Learning for Mobile...
  • Hungarian Language Corpora and Analyzers

    Resources, including corpora and software, for processing Hungarian language. Language resources The Hunglish Corpus is a sentence-aligned Hungarian-English parallel...
  • eXtended WordNet

    About From website: WordNet is a lexical database for English that has been widely adopted in artificial intelligence and computational linguistics for a variety of practical...
  • Europarl Parallel Corpus

    Description Overview from home page: The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages:...
  • Enron Email Dataset

    About From distribution page: This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150...
Usted también puede acceder a este registro utilizando los API (ver API Docs).