Feature Engineering in Persian Dependency Parser

Authors

  • S. Lazemi Department of Computer Eng., University of Kashan, Kashan, Iran.
Abstract:

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser for Persian. The defined feature space in each parser is one of the important factors in its success. Our goal is to generate and extract appropriate features to dependency parsing of Persian sentences. To achieve this goal, new semantic and syntactic features have been defined and added to the MSTParser by stacking method. Semantic features are obtained by using word clustering algorithms based on syntagmatic analysis and syntactic features are obtained by using the Persian phrase-structure parser and have been used as bit-string. Experiments have been done on the Persian Dependency Treebank (PerDT) and the Uppsala Persian Dependency Treebank (UPDT). The results indicate that the definition of new features improves the performance of the dependency parser for the Persian. The achieved unlabeled attachment score for PerDT and UPDT are 89.17% and 88.96% respectively.

Upgrade to premium to download articles

Sign up to access the full text

Already have an account?login

similar resources

Feature Engineering in Maximum Spanning Tree Dependency Parser

In this paper we present the results of our experiments with modifications of the feature set used in the Czech mutation of the Maximum Spanning Tree parser. First we show how new feature templates improve the parsing accuracy and second we decrease the dimensionality of the feature space to make the parsing process more effective without sacrificing accuracy.

full text

ParsPer: A Dependency Parser for Persian

We present a dependency parser for Persian, called ParsPer, developed using the graph-based parser in the Mate Tools. The parser is trained on the entire Uppsala Persian Dependency Treebank with a specific configuration that was selected by MaltParser as the best performing parsing representation. The treebank’s syntactic annotation scheme is based on Stanford Typed Dependencies with extensions...

full text

Memory-Based Re-Engineering of a Knowledge-Based Dependency Parser

The emulation of a knowledge-based dependency parser for Dutch by a fast approximation of a memory-based learning algorithm is described. During the development of the original parser, hand-parsed test sentences were collected to offer stochastic guidance in the the parsing process. Training a memory-based parser directly on these collections yields a reasonable but not very accurate emulation....

full text

Dependency Parsers for Persian

We present two dependency parsers for Persian, MaltParser and MSTParser, trained on the Uppsala PErsian Dependency Treebank. The treebank consists of 1,000 sentences today. Its annotation scheme is based on Stanford Typed Dependencies (STD) extended for Persian with regard to object marking and light verb contructions. The parsers and the treebank are developed simultanously in a bootstrapping ...

full text

Dependency parser demo

1 Introduction We are concerned with surface-syntactic parsing of running text. Our main goal is to describe a syntactic analysis of sentences using dependency links that show the head-dependent relations between words. The new dependency parser 1 (Tapanainen and J~ir-vinen, 1997; J~rvinen and Tapanainen, 1997) belongs to a continuous effort to apply rule-based methods to natural languages. It ...

full text

A Dependency Parser for Tweets

We describe a new dependency parser for English tweets, TWEEBOPARSER. The parser builds on several contributions: new syntactic annotations for a corpus of tweets (TWEEBANK), with conventions informed by the domain; adaptations to a statistical parsing algorithm; and a new approach to exploiting out-of-domain Penn Treebank data. Our experiments show that the parser achieves over 80% unlabeled a...

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}


Journal title

volume 7  issue 3

pages  467- 474

publication date 2019-07-01

By following a journal you will be notified via email when a new issue of this journal is published.

Hosted on Doprax cloud platform doprax.com

copyright © 2015-2023