Detecting Syntactic Errors in Dependency Treebanks for Morphosyntactically Rich Languages
نویسندگان
چکیده
The paper introduces a new method for detecting and correcting errors in large dependency treebanks with rich morphosyntactic annotation. The technique uses error correction rules automatically extracted from the treebank. The procedure of rule extraction is based on a comparison of similar – but not identical – subgraphs of dependency structures. The outcome of applying the method to a 3-million-sentence dependency treebank of Polish is presented and evaluated. The method achieves satisfactory precision in the task of automatic error correction and relatively high precision in the task of error detection.
منابع مشابه
On Detecting Errors in Dependency Treebanks
Dependency relations between words are increasingly recognized as an important level of linguistic representation that is close to the data and at the same time to the semantic functor-argument structure as a target of syntactic analysis and processing. Correspondingly, dependency structures play an important role in parser evaluation and for the training and evaluation of tools based on depend...
متن کاملClassifying Languages by Dependency Structure. Typologies of Delexicalized Universal Dependency Treebanks
This paper shows how the current Universal Dependency treebanks can be used for clustering structural global linguistic features of the treebanks to reveal a purely structural syntactic typology of languages. Different uniand multi-dimensional data extraction methods are explored and tested in order to assess both the coherence of the underlying syntactic data and the quality of the clustering ...
متن کاملCoordination Structures in Dependency Treebanks
Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms. This has painful consequences such as high frequency of parsing errors related to coordination. In other words, coordination is a pending problem in dependency analysis of natural languages. This paper tries to shed some light on this area by bringing a systematizing view of various formal means dev...
متن کاملDetecting Multiword Expressions by Dependency Parsing
In this poster, we present how different types of MWEs can be identified by dependency parsers in different languages. In our investigations, we focus on English verb-particle constructions (VPCs), Hungarian light verb constructions (LVCs) and German light verb constructions. In our experiments, we exploit the fact that some treebanks contain MWE-aware annotations, i.e. there are MWE-specific m...
متن کاملAn annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies
A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...
متن کامل