Automatic discourse structure generation using rhetorical structure theory

نویسنده

  • Huong LeThanh
چکیده

This tiiesis addresses a difficult problem in text processing: crealing a System lo automalically dérive rhetorical structures o f text. Allhough thè rhelorical structure lias proven to be useful in many fields o f text processing sucli as text summarisation and information extraction, Systems that auiomalically generate rhetorical structures with high accuracy are difficult to find. This is bccause discourse is one of the biggest and yet least well defined arcns in linguistics. A n agreement amongst researchcrs on the best method for nnalysing thc rhetorical structure of text lias not been found. ' . This thcsis focuscs on investigaliug a method lo generate the rhetorical structures of text. By exploiting différent cohesive devices, it proposes a method to recognise rhetorical relations belween spans by checking for thc appearanec o f thèse devices. Thèse factors include eue phrases, noun-phrase eues, verb-phrase eues, référence words, time références, substitution words, ellipses, and syntaclic information. The discourse analyser is divided into tvvo levels: sentence-level and text-level. The former uses syntactic information and eue phrases to segment sentences into elementary discourse units and to generate a rhetorical structure for each sentence. The latter dérives rhetorical relations between large spans and theii replaces each sentence by its corresponding rhetorical structure to produce the rhelorical structure o f text. The rhetorical structure at the text-level is derived by selecting rhetorical relations to connect adjacent and non-overlapping spans to form a discourse structure that covers the entire text. Constraints o f texlual organisation and textual adjacency are effectively used in a beam search to reduce the search space in generating such rhetorical structures. Expèriments carried out in ihis research rcccivcd 89.4% F-score for the discoursc segmentation, 52.4% Fscore for thc senlence-levcl discourse analyser and 38.1% F-score for the final output o f the System. Il shows that this approach provides good performance cumparison with current research in discourse.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Prosody of Discourse Structure and Content in the Production of Persian EFL Learners

The present research addressed the prosodic realization of global and local text structure and content in the spoken discourse data produced by Persian EFL learners. Two newspaper articles were analyzed using Rhetorical Structure Theory. Based on these analyses, the global structure in terms of hierarchical level, the local structure in terms of the relative importance of text segments and the ...

متن کامل

Query Focused Summary Generation System using Unique Discourse Structure

In this paper, the authors propose a query focussed summary generation system which is constructed on top of a unique language-independent discourse structure. The discourse structure is comprised of three text representation techniques, namely, Universal Networking Language (UNL), Rhetorical Structure Theory (RST) and saṅgatis. The discourse structure is indexed based on a concept called sūtra...

متن کامل

The Rhetorical Parsing, Summarization, and Generation of Natural Language Texts

This thesis is an inquiry into the nature of the high-level, rhetorical structure of unrestricted natural language texts, computational means to enable its derivation, and two applications (in automatic summarization and natural language generation) that follow from the ability to build such structures automatically. The thesis proposes a rst-order formalization of the high-level, rhetorical st...

متن کامل

Logic-Based Rhetorical Structuring for Natural Language Generation in Human-Computer Dialogue

Rhetorical structuring is field approached mostly by research in natural language (pragmatic) interpretation. However, in natural language generation (NLG) the rhetorical structure plays an important part, in monologues and dialogues as well. Hence, several approaches in this direction exist. In most of these, the rhetorical structure is calculated and built in the framework of Rhetorical Struc...

متن کامل

Abstract Generation Based On Rhetorical Structure Extraction

generation is, like Machine Translation, one of the ultimate goal of Natural Language Processing. However, since conventional word–frequency– based abstract generation systems(e.g. [Kuhn 58]) are lacking in inter-sentential or discourse-structural analysis, they are liable to generate incoherent abstracts. On the other hand, conventional knowledge or script–based abstract generation systems(e.g...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004