Constructing a Textual Semantic Relation Corpus Using a Discourse Treebank
نویسندگان
چکیده
In this paper, we present our work on constructing a textual semantic relation corpus by making use of an existing treebank annotated with discourse relations. We extract adjacent text span pairs and group them into six categories according to the different discourse relations between them. After that, we present the details of our annotation scheme, which includes six textual semantic relations, backward entailment, forward entailment, equality, contradiction, overlapping, and independent. We also discuss some ambiguous examples to show the difficulty of such annotation task, which cannot be easily done by an automatic mapping between discourse relations and semantic relations. We have two annotators and each of them performs the task twice. The basic statistics on the constructed corpus looks promising: we achieve 81.17% of agreement on the six semantic relation annotation with a .718 kappa score, and it increases to 91.21% if we collapse the last two labels with a .775 kappa score.
منابع مشابه
Semi-supervised Discourse Relation Classification with Structural Learning
The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a timeconsuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classificat...
متن کاملSignalling Subject Matter and Presentational Coherence Relations in Discourse: a Corpus Study
In this study, we examine how subject matter and presentational coherence relations in Rhetorical Structure Theory (Mann and Thompson 1988) are signalled in written discourse, and whether they differ quantitatively or qualitatively in terms of the signalling devices involved. By signalling we mean textual signals (discourse markers such as although, because and thus, and also signals such as te...
متن کاملTowards Semi-Supervised Classification of Discourse Relations using Feature Correlations
Two of the main corpora available for training discourse relation classifiers are the RST Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), which are both based on the Wall Street Journal corpus. Most recent work using discourse relation classifiers have employed fully-supervised methods on these corpora. However, certain discourse relations have little labeled data, causing l...
متن کاملSentence Structure and Discourse Structure: Possible Parallels
The present contribution represents the first step in comparing the nature of syntactico-semantic relations present in the sentence structure to their equivalents in the discourse structure. The study is carried out on the basis of a Czech manually annotated material collected in the Prague Dependency Treebank (PDT). According to the semantic analysis of the underlying syntactic structure of a ...
متن کاملA Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension
Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, unlabeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based...
متن کامل