Experiments in Constructing a Corpus of Discourse Trees: Problems, Annotation Choices, Issues
نویسندگان
چکیده
We present a tagging schema and a tagging tool for labeling the rhetorical structure of texts. We focus on presenting the difficulties that we faced in designing a discourse annotation manual and on discussing the choices that we made in order to address these difficulties. We report reliability results concerning our agreement on building the rhetorical structure of 90 texts of three genres: 30 news stories, 30 editorials, and 30 scientific articles.
منابع مشابه
1 Alter 2 Loosen 3 Change Sequence 1 Alter 2 Loosen 3 Change Sequence Means 2 Loosen 3 Change Means
We present discourse annotation work aimed at constructing a parallel corpus of Rhetorical Structure trees for a collection of Japanese texts and their corresponding English translations. We discuss implications of our empirical ndings for the task of text planning in the context of implementing multilingual natural language generation systems.
متن کاملSemantic Annotation for Generation: Issues in annotating a corpus to develop and evaluate discourse entity realization algorithms
We are annotating a corpus with information relevant to discourse entity realization, and especially the information needed to decide which type of NP to use. The corpus is being used to study correlations between NP type and certain semantic or discourse features, to evaluate hand-coded algorithms, and to train statistical models. We report on the development of our annotation scheme, the prob...
متن کاملConstructing an Annotated Story Corpus: Some Observations and Issues
This paper discusses our ongoing work on constructing an annotated corpus of children’s stories for further studies on the linguistic, computational, and cognitive aspects of story structure and understanding. Given its semantic nature and the need for extensive common sense and world knowledge, story understanding has been a notoriously difficult topic in natural language processing. In partic...
متن کاملExamining the Effect of Ideology and Idiosyncrasy on Lexical Choices in Translation Studies within the CDA Framework
Using a critical discourse analytic model of translation criticism, the present study attempts to explore the effect of ideology and idiosyncrasy on the lexical choices in translation studies. The study employed a descriptive approach to answer two research questions: Is there any relationship between ideology and idiosyncratic features of translators' lexical choices? And if yes, can it be ana...
متن کاملBuilding a Discourse-Annotated Dutch Text Corpus
We are compiling a corpus of Dutch texts annotated with discourse structure and lexical cohesion, containing initially 80 texts from expository and persuasive genres. We are using this resource for corpus-based studies of discourse relations, discourse markers, cohesion, and genre differences. We are also exploring the possibilities of automatic text segmentation and semi-automatic discourse an...
متن کامل