Existem 40 conjuntos de dados marcados com a etiqueta linguistics:
-
-
This database contains more than 3,000 notices on major linguistic books on grammar, from Antiquity to now. Major books will progressively be digitized and made available through the...
-
-
About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of the...
-
About Authors: Alan Wrench, Queen Margaret University College. Funded by: Engineering and Physical Sciences Research Council. When created: November 1999....
-
The SPECIALIST lexicon is a large syntactic lexicon of biomedical and general English. Coverage includes both commonly occurring English words and biomedical vocabulary. The lexicon entry...
-
From website: The speech accent archive uniformly presents a large set of speech samples from a variety of language backgrounds. Native and non-native speakers of English read the same...
-
About From the about page: The Rosetta Project is a global collaboration of language specialists and native speakers working to build a publicly accessible digital library of human...
-
-
WordNet-like concept network developed at MIT ConceptNet aims to give computers access to common-sense knowledge, the kind of information that ordinary people know but usually leave...
-
About The IBL Corpus was collected by the University of Plymouth and the University of Edinburgh as part of the EPSRC funded project IBL, Instruction-based Learning for Mobile Robots...
-
From the web site: Version 4 of the MULTEXT-East resources, a multilingual dataset for language engineering research and development. This dataset contains, for Bulgarian, Croatian,...
-
Lemon data extracted from Wiktionary
-
DBpedia Spotlight is a tool for annotating mentions of DBpedia resources in text, providing a solution for linking unstructured information sources to the Linked Open Data cloud through...
-
ASJP collects 40 words from 5500 languages in a simplified phonetic representation. More background can be found at http://email.eva.mpg.de/~wichmann/ASJPHomePage.htm
-
About From website: WordNet is a lexical database for English that has been widely adopted in artificial intelligence and computational linguistics for a variety of practical...
-
The WordNet 2.0 model in the lemon format
-
Sentence-layer annotation represents the most coarse-grained annotation in this corpus. We adhere to definitions of objectivity and subjectivity introduced in (Wiebe et al., 2005)....
-
The French TimeBank consists of a set of 109 journalistic articles from 7 different sub-genres annotated according to the ISO-TimeML standard, adapted for the French language....
-
1200 words in 200 languages
-
The Ontologies of Linguistic Annotations (OLiA) provide an OWL/DL taxonomy of data categories as a reference for linguistic annotation (OLiA Reference Model), plus OWL/DL models for a...
-
About Overview: The WikiWord-Thesaurus is a multilingual Thesaurus derived from Wikipedia by extracting lexical and semantic information. It was originally developed for a diploma thesis...
-
About Overview: WikiWord is a system for building a multilingual Thesaurus by extracting lexical and semantic information from Wikipedia. It was originally developed for a diploma thesis...
-
GeoWordNet is a semantic resource built from the full integration of WordNet, GeoNames and the Italian part of MultiWordNet. GeoWordNet Public Dataset contains 3,698,238 entities,...
-
The World Loanword Database, edited by Martin Haspelmath and Uri Tadmor, is a scientific publication by the Max Planck Digital Library, Munich (2009). It provides vocabularies...
-
We aim to provide a open-source framework (based on DBpedia) to extract semantic lexical resources (a ontology about language use) from Wiktionary. The data currently includes language,...
-
This is a recipe to train word n-gram language models using the newswire text provided in the English Gigaword corpus (1200M words of NYT, APW, AFE, XIE). It also prepares dictionaries...
-
Resources, including corpora and software, for processing Hungarian language. Language resources The Hunglish Corpus is a sentence-aligned Hungarian-English parallel corpus...
-
RDF conversion of Princeton's package:wordnet, version 3.0. With many links to package:w3c-wordnet, package:lexvo and the Dutch package:cornetto.
-
The Parole/Simple 'lexinfo' Ontology is the OWL version of the Parole/Simple model (defined during the PAROLE LE2-4017 and SIMPLE LE4-8346 projects) once mapped to Lexinfo Model. The...
-
The Semantic Quran dataset is a multilingual RDF representation of translations of the Quran. The dataset was created by integrating data from two different semi-structured sources. The...
-
The Ontos News Portal extracts facts (objects as e. g. persons or organizations as well as relations between them, e. g. a person is working for an organization or living at a location)....
-
SentimentWortschatz, or SentiWS for short, is a publicly available German-language resource for sentiment analysis, opinion mining etc. It lists positive and negative polarity bearing...
-
From their web site: JRC-Names is a highly multilingual named entity resource for person and organisation names (called 'entities'). It consists of large lists of names and their many...
-
A lexical database documenting translations among lexemes of language varieties.
-
Description Overview from home page: The Europarl parallel corpus is extracted from the proceedings of the European Parliament. It includes versions in 11 European languages: Romanic...
-
Dutch lexical database, similar to WordNet but with more semantic relations. Links to package:vu-wordnet and package:w3c-wordnet. When this dataset is used for research purposes,...
-
ISO 12620 provides a framework for defining data categories compliant with the ISO/IEC 11179 family of standards. According to this model, each data category is assigned a unique...
-
Glottolog/Langdoc provides information about descriptive literature for all the world's languages. It also provides language classifications as well as knowledge bases for names, codes,...
-
Deutscher Wortschatz contains data generated from newspapers and web resources that are publicly available. The data were collected per language and encompass statistics about...