Existem 15 conjuntos de dados marcados com a etiqueta corpus:
-
About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of the...
-
About From website: As of November 2007, the European Commission's Directorate-General for Translation (DGT) made publicly accessible its multilingual Translation Memory for the Acquis...
-
About VoxForge was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac). We will make available all submitted...
-
About From distribution page: This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly...
-
About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999....
-
About The IBL Corpus was collected by the University of Plymouth and the University of Edinburgh as part of the EPSRC funded project IBL, Instruction-based Learning for Mobile Robots...
-
About From website: WordNet is a lexical database for English that has been widely adopted in artificial intelligence and computational linguistics for a variety of practical...
-
The Manually Annotated Sub-Corpus (MASC) consists of approximately 500,000 words of contemporary American English written and spoken data drawn from the OPEN AMERICAN NATIONAL CORPUS...
-
About From website: The New York Times Annotated Corpus contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with...
-
The Ontologies of Linguistic Annotations (OLiA) provide an OWL/DL taxonomy of data categories as a reference for linguistic annotation (OLiA Reference Model), plus OWL/DL models for a...
-
OPUS is a growing collection of translated texts from the web. In the OPUS project we try to convert and align free online data, to add linguistic annotation, and to provide the community...
-
This is a recipe to train word n-gram language models using the newswire text provided in the English Gigaword corpus (1200M words of NYT, APW, AFE, XIE). It also prepares dictionaries...
-
Resources, including corpora and software, for processing Hungarian language. Language resources The Hunglish Corpus is a sentence-aligned Hungarian-English parallel corpus...
-
Description Overview from home page: The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic...
-
The OCTC hosts open-content texts, encoded in TEI P5 XML, for many languages, each in a separate subcorpus. Another part of the OCTC stores interlanguage alignment info. The project is...