Hungarian Language Corpora and Analyzers

Resources, including corpora and software, for processing Hungarian language.

Language resources

  • The Hunglish Corpus is a sentence-aligned Hungarian-English parallel corpus published under the Creative Commons Attribution license.
  • The Hungarian Webcorpus is a gigaword corpus of Hungarian gathered from the web.
  • The Hunglish dictionary is a machine readable English-Hungarian bilingual lexicon.
  • morphdb.hu is a Hungarian morphological database for use with Hunmorph morphological analyzer.

Software

  • hunpos is a HMM based open source part-of-speech tagger.
  • hunmorph is an open source tool and programming library for spell-checking, stemming and morphological analysing of agglutinative, german and other languages.
  • hunalign is a language independent sentence level aligner to build parallel corpora.

Data and Resources

Additional Info

Mező Érték
Forrás http://mokk.bme.hu/resources/
Szerző MOKK - Budapest University of Technology and Economics

Comments