Towards Cross-Domain PDTB-Style Discourse Parsing
نویسندگان
چکیده
Discourse relation parsing is an important task with the goal of understanding text beyond the sentence boundaries. With the availability of annotated corpora (Penn Discourse Treebank) statistical discourse parsers were developed. In the literature it was shown that the discourse parsing subtasks of discourse connective detection and relation sense classification do not generalize well across domains. The biomedical domain is of particular interest due to the availability of Biomedical Discourse Relation Bank (BioDRB). In this paper we present cross-domain evaluation of PDTB trained discourse relation parser and evaluate feature-level domain adaptation techniques on the argument span extraction subtask. We demonstrate that the subtask generalizes well across domains.
منابع مشابه
Inferring Discourse Relations from PDTB-style Discourse Labels for Argumentative Revision Classification
Penn Discourse Treebank (PDTB)-style annotation focuses on labeling local discourse relations between text spans and typically ignores larger discourse contexts. In this paper we propose two approaches to infer discourse relations in a paragraph-level context from annotated PDTB labels. We investigate the utility of inferring such discourse information using the task of revision classification....
متن کاملTowards Full Text Shallow Discourse Relation Annotation: Experiments with Cross-Paragraph Implicit Relations in the PDTB
Full text discourse parsing relies on texts comprehensively annotated with discourse relations. To this end, we address a significant gap in the inter-sentential discourse relations annotated in the Penn Discourse Treebank (PDTB), namely the class of cross-paragraph implicit relations, which account for 30% of inter-sentential relations in the corpus. We present our annotation study to explore ...
متن کاملCross-Domain and Cross-Language Porting of Shallow Parsing
English was the main focus of attention of the Natural Language Processing (NLP) community for years. As a result, there are significantly more annotated linguistic resources in English than in any other language. Consequently, data-driven tools for automatic text or speech processing are developed mainly for English. Developing similar corpora and tools for other languages is an important issu...
متن کاملPDTB-style Discourse Annotation of Chinese Text
We describe a discourse annotation scheme for Chinese and report on the preliminary results. Our scheme, inspired by the Penn Discourse TreeBank (PDTB), adopts the lexically grounded approach; at the same time, it makes adaptations based on the linguistic and statistical characteristics of Chinese text. Annotation results show that these adaptations work well in practice. Our scheme, taken toge...
متن کاملExtracting PDTB Discourse Relations from Student Essays
We investigate the manual and automatic annotation of PDTB discourse relations in student essays, a novel domain that is not only learning-based and argumentative, but also noisy with surface errors and deeper coherency issues. We discuss methodological complexities it poses for the task. We present descriptive statistics and compare relation distributions in related corpora. We compare automat...
متن کامل