Annotated Hungarian National Corpus
نویسندگان
چکیده
Zoltan Alexin Department of Informatics University of Szeged [email protected]—szeged.hu Tibor Gyinnithy Research Group on Artifical Intelligence at University of Szeged [email protected]—szeged.hu Csaba Hatvani Department of Informatics University of Szeged [email protected]—szeged.hu LaszlO Tihanyi MorphoLogic Budapest [email protected] Janos Csirik Department of Informatics University of Szeged [email protected]—szeged.hu Karoly Bibok Slavic Institute University of Szeged [email protected]—szeged.hu Gabor PrOszeky MorphoLogic Budapest [email protected]
منابع مشابه
A Hungarian Sentiment Corpus Manually Annotated at Aspect Level
In this paper we present a Hungarian sentiment corpus manually annotated at aspect level. Our corpus consists of Hungarian opinion texts written about different types of products. The main aim of creating the corpus was to produce an appropriate database providing possibilities for developing text mining software tools. The corpus is a unique Hungarian database: to the best of our knowledge, no...
متن کاملThe Szeged Corpus: A POS Tagged and Syntactically Annotated Hungarian Natural Language Corpus
The Szeged Corpus is a manually annotated natural language corpus currently comprising 1.2 million word entries, 145 thousand different word forms, and an additional 225 thousand punctuation marks. With this, it is the largest manually processed Hungarian textual database that serves as a reference material for research in natural language processing as well as a learning database for machine l...
متن کاملLight Verb Constructions in the SzegedParalellFX English-Hungarian Parallel Corpus
In this paper, we describe the first English–Hungarian parallel corpus annotated for light verb constructions, which contains 14,261 sentence alignment units. Annotation principles and statistical data on the corpus are also provided, and English and Hungarian data are contrasted. On the basis of corpus data, a database containing pairs of English–Hungarian light verb constructions has been cre...
متن کاملManually Annotated Hungarian Corpus
Current paper presents the results of a two-year project during which a consortium of the University of Szeged and the MorphoLogic Ltd. Budapest developed a morpho-syntactically parsed and annotated (disambiguated) corpus for Hungarian. For morpho-syntactic encoding, the Hungarian version of MSD (MorphoSyntactic Description) has been used. The corpus contains texts of five different topic areas...
متن کاملHungarian Corpus of Light Verb Constructions
The precise identification of light verb constructions is crucial for the successful functioning of several NLP applications. In order to facilitate the development of an algorithm that is capable of recognizing them, a manually annotated corpus of light verb constructions has been built for Hungarian. Basic annotation guidelines and statistical data on the corpus are also presented in the pape...
متن کاملMorphological annotation of Old and Middle Hungarian corpora
In our paper, we present a computational morphology for Old and Middle Hungarian used in two research projects that aim at creating morphologically annotated corpora of Old and Middle Hungarian. In addition, we present the web-based disambiguation tool used in the semi-automatic disambiguation of the annotations and the structured corpus query tool that has a unique but very useful feature of m...
متن کامل