Hungarian Language Corpora and Analyzers

Resources, including corpora and software, for processing Hungarian language.

Language resources

  • The Hunglish Corpus is a sentence-aligned Hungarian-English parallel corpus published under the Creative Commons Attribution license.
  • The Hungarian Webcorpus is a gigaword corpus of Hungarian gathered from the web.
  • The Hunglish dictionary is a machine readable English-Hungarian bilingual lexicon.
  • is a Hungarian morphological database for use with Hunmorph morphological analyzer.


  • hunpos is a HMM based open source part-of-speech tagger.
  • hunmorph is an open source tool and programming library for spell-checking, stemming and morphological analysing of agglutinative, german and other languages.
  • hunalign is a language independent sentence level aligner to build parallel corpora.

Data and Resources

Additional Info

Mező Érték
Szerző MOKK - Budapest University of Technology and Economics
Last Updated Október 10, 2013, 21:18 (Etc/UTC)
Created Február 12, 2011, 14:54 (Etc/UTC)
comments powered by Disqus
comments powered by Disqus