Multi-Paragraph Segmentation of Expository Text
نویسنده
چکیده
This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which reeect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithm are described and shown to produce segmentation that corresponds well to human judgments of the major subtopic boundaries of thirteen lengthy texts.
منابع مشابه
SEGMENTATION OF EXPOSITORY TEXT Marti
This paper describes TextTiling, an algorithm for partitioning expository texts into coherent multi-paragraph discourse units which re ect the subtopic structure of the texts. The algorithm uses domain-independent lexical frequency and distribution information to recognize the interactions of multiple simultaneous themes. Two fully-implemented versions of the algorithmare described and shown to...
متن کاملMulti - Paragraph Segmentation of ExpositoryTextsMarti
We present a method for partitioning expository texts into coherent multi-paragraph units which reeect the subtopic structure of the texts. Using Chafe's Flow Model of discourse, we observe that subtopics are often expressed by the interaction of multiple simultaneous themes. We describe two fully-implemented algorithms that use only term repetition information to determine the extents of the s...
متن کاملTextTiling: Segmenting Text into Multi-paragraph Subtopic Passages
TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to produce segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts. Multi-paragraph s...
متن کاملOptimal Multi-Paragraph Text Segmentation by Dynamic Programming
There exist several methods of calculating a similarity curve, or a sequence of similarity values, representing the lexical cohesion of successive text constituents, e.g., paragraphs. Methods for deciding the locations of fragment boundaries are, however, scarce. We propose a fragmentation method based on dynamic programming. The method is theoretically sound and guaranteed to provide an optima...
متن کاملAssessing Reading Comprehension of Expository Text across Different Response Formats
This study investigated if different response formats (test methods) measure reading comprehension of expository text differently. The study was conducted with 48 semester 6 TESL students at a university in Selangor, Malaysia. These students received an expository passage having descriptive rhetorical structure followed by three response formats, namely, incomplete outline, graphic organizer, a...
متن کامل