Etiqueta: corpus

Existem 15 conjuntos de dados marcados com a etiqueta corpus:

  • TalkBank
    • 35 views
    • None Não Abertamente Licenciado
    About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of the...
  • About From website: As of November 2007, the European Commission's Directorate-General for Translation (DGT) made publicly accessible its multilingual Translation Memory for the Acquis...
  • VoxForge
    • 49 views
    • None Não Abertamente Licenciado
    About VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac). We will make available all submitted...
  • About From distribution page: This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly...
  • MOCHA-TIMIT
    • 77 views
    • None Não Abertamente Licenciado
    About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999....
  • The IBL Corpus
    • 15 views
    • None Não Abertamente Licenciado
    About The IBL Corpus was collected by the University of Plymouth and the University of Edinburgh as part of the EPSRC funded project IBL, Instruction-based Learning for Mobile Robots...
  • About From website: WordNet is a lexical database for English that has been widely adopted in artificial intelligence and computational linguistics for a variety of practical...
  • The Manually Annotated Sub-Corpus (MASC) consists of approximately 500,000 words of contemporary American English written and spoken data drawn from the OPEN AMERICAN NATIONAL CORPUS...
  • The New York Times Annotated Corpus
    • 137 views
    • None Não Abertamente Licenciado
    About From website: The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with...
  • The Ontologies of Linguistic Annotations (OLiA) provide an OWL/DL taxonomy of data categories as a reference for linguistic annotation (OLiA Reference Model), plus OWL/DL models for a...
  • OPUS is a growing collection of translated texts from the web. In the OPUS project we try to convert and align free online data, to add linguistic annotation, and to provide the community...
  • This is a recipe to train word n-gram language models using the newswire text provided in the English Gigaword corpus (1200M words of NYT, APW, AFE, XIE). It also prepares dictionaries...
  • Resources, including corpora and software, for processing Hungarian language. Language resources The Hunglish Corpus is a sentence-aligned Hungarian-English parallel corpus...
  • Description Overview from home page: The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic...
  • Open-Content Text Corpus
    • 31 views
    • None Não Abertamente Licenciado
    The OCTC hosts open-content texts, encoded in TEI P5 XML, for many languages, each in a separate subcorpus. Another part of the OCTC stores interlanguage alignment info. The project is...