Parsing the SynTagRus Treebank of Russian
نویسندگان
چکیده
We present the first results on parsing the SYNTAGRUS treebank of Russian with a data-driven dependency parser, achieving a labeled attachment score of over 82% and an unlabeled attachment score of 89%. A feature analysis shows that high parsing accuracy is crucially dependent on the use of both lexical and morphological features. We conjecture that the latter result can be generalized to richly inflected languages in general, provided that sufficient amounts of training data are available.
منابع مشابه
Building a Dependency Parsing Model for Russian with MaltParser and MyStem Tagset
The paper describes a series of experiments on building a dependency parsing model using MaltParser, the SynTagRus treebank of Russian, and the morphological tagger Mystem. The experiments have two purposes. The first one is to train a model with a reasonable balance of quality and parsing time. The second one is to produce user-friendly software which would be practical for obtaining quick res...
متن کاملM a T E M a T I C K O -f Y Z I K Á L N Í F a K U L T a Conversion of Syntagrus (the Russian Dependency Treebank) to Universal Dependencies
This report presents the Universal Dependency (UD) annotated corpus for Russian and a conversion process which was developed to transform SynTagRus, the Russian dependency treebank, into a UD-style annotated corpus. The aim of this work was to create a UD-style annotated corpus for Russian since no such corpus was available prior to UD release 1.3. The conversion rules were based on manually an...
متن کاملConverting SynTagRus Dependency Treebank into Penn Treebank Style
This paper presents the conversion of SynTagRus dependency structures into Penn Treebank style phrase structures, whose resulting data will be used to train a statistical constituency parser for Russian and create a large-scale constituency-parsed corpus. The implemented conversion includes various innovative features in order to create phrase structure trees that are closest to Penn Treebank s...
متن کاملتصحیح خودکار خطا در درخت بانک نحوی با استفاده از یادگیری ماشینی انتقال محور
The Treebank is one of the most useful resources for supervised or semi-supervised learning in many NLP tasks such as speech recognition, spoken language systems, parsing and machine translation. Treebank can be developded in different ways that could be, generally, categorized in manually and statistical approaches. While the resulted Treebank in each of these methods has the annotation error,...
متن کاملFeature Engineering in Persian Dependency Parser
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008