Constructing a Textual Semantic Relation Corpus Using a Discourse Treebank

نویسندگان

  • Rui Wang
  • Caroline Sporleder
چکیده

In this paper, we present our work on constructing a textual semantic relation corpus by making use of an existing treebank annotated with discourse relations. We extract adjacent text span pairs and group them into six categories according to the different discourse relations between them. After that, we present the details of our annotation scheme, which includes six textual semantic relations, backward entailment, forward entailment, equality, contradiction, overlapping, and independent. We also discuss some ambiguous examples to show the difficulty of such annotation task, which cannot be easily done by an automatic mapping between discourse relations and semantic relations. We have two annotators and each of them performs the task twice. The basic statistics on the constructed corpus looks promising: we achieve 81.17% of agreement on the six semantic relation annotation with a .718 kappa score, and it increases to 91.21% if we collapse the last two labels with a .775 kappa score.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-supervised Discourse Relation Classification with Structural Learning

The corpora available for training discourse relation classifiers are annotated using a general set of discourse relations. However, for certain applications, custom discourse relations are required. Creating a new annotated corpus with a new relation taxonomy is a timeconsuming and costly process. We address this problem by proposing a semi-supervised approach to discourse relation classificat...

متن کامل

Signalling Subject Matter and Presentational Coherence Relations in Discourse: a Corpus Study

In this study, we examine how subject matter and presentational coherence relations in Rhetorical Structure Theory (Mann and Thompson 1988) are signalled in written discourse, and whether they differ quantitatively or qualitatively in terms of the signalling devices involved. By signalling we mean textual signals (discourse markers such as although, because and thus, and also signals such as te...

متن کامل

Towards Semi-Supervised Classification of Discourse Relations using Feature Correlations

Two of the main corpora available for training discourse relation classifiers are the RST Discourse Treebank (RST-DT) and the Penn Discourse Treebank (PDTB), which are both based on the Wall Street Journal corpus. Most recent work using discourse relation classifiers have employed fully-supervised methods on these corpora. However, certain discourse relations have little labeled data, causing l...

متن کامل

Sentence Structure and Discourse Structure: Possible Parallels

The present contribution represents the first step in comparing the nature of syntactico-semantic relations present in the sentence structure to their equivalents in the discourse structure. The study is carried out on the basis of a Czech manually annotated material collected in the Prague Dependency Treebank (PDT). According to the semantic analysis of the underlying syntactic structure of a ...

متن کامل

A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, unlabeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010