Using lexical and Dependency Features to Disambiguate Discourse Connectives in Hindi
نویسندگان
چکیده
Discourse parsing is a challenging task in NLP and plays a crucial role in discourse analysis. To enable discourse analysis for Hindi, Hindi Discourse Relations Bank was created on a subset of Hindi TreeBank. The benefits of a discourse analyzer in automated discourse analysis, question summarization and question answering domains has motivated us to begin work on a discourse analyzer for Hindi. In this paper, we focus on discourse connective identification for Hindi. We explore various available syntactic features for this task. We also explore the use of dependency tree parses present in the Hindi TreeBank and study the impact of the same on the performance of the system. We report that the novel dependency features introduced have a higher impact on precision, in comparison to the syntactic features previously used for this task. In addition, we report a high accuracy of 96% for this task.
منابع مشابه
Prosody cues for classification of the discourse particle "hã" in hindi
In Hindi, affirmative particle "hã" carries out a variety of discourse functions. Preliminary investigation has shown that though it is difficult to disambiguate these different functions from prosody alone, there seems to be a distinct prosodic pattern associated with each of these. In this paper, we present a corpus study of spoken utterances of the Hindi word "hã". We identify these prosodic...
متن کاملTowards an Annotated Corpus of Discourse Relations in Hindi
We describe our initial efforts towards developing a large-scale corpus of Hindi texts annotated with discourse relations. Adopting the lexically grounded approach of the Penn Discourse Treebank (PDTB), we present a preliminary analysis of discourse connectives in a small corpus. We describe how discourse connectives are represented in the sentence-level dependency annotation in Hindi, and disc...
متن کاملUsing Syntax to Disambiguate Explicit Discourse Connectives in Text
Discourse connectives are words or phrases such as once, since, and on the contrary that explicitly signal the presence of a discourse relation. There are two types of ambiguity that need to be resolved during discourse processing. First, a word can be ambiguous between discourse or non-discourse usage. For example, once can be either a temporal discourse connective or a simply a word meaning “...
متن کاملAcquiring a Disambiguation Model For Discourse Connectives
Discourse connectives can show sense ambiguities, in that they can signal more than one possible rhetorical relation. The aim of this study is discover how to disambiguate such discourse connectives using a statistical model. Six discourse connectives (after, as soon as, before, once, since and while) which show ambiguities in the sdrt (Segmented Discourse Representation Theory (Asher & Lascari...
متن کاملDisambiguating Explicit Discourse Connectives without Oracles
Deciding whether a word serves a discourse function in context is a prerequisite for discourse processing, and the performance of this subtask bounds performance on subsequent tasks. Pitler and Nenkova (2009) report 96.29% accuracy (F1 94.19%) relying on features extracted from gold-standard parse trees. This figure is an average over several connectives, some of which are extremely hard to cla...
متن کامل