Lexical and Semantic Methods in Inner Text Topic Segmentation: A Comparison between C99 and Transeg
نویسندگان
چکیده
This paper present a semantic and syntactic distance based method in topic text segmentation and compare it to a very well known text segmentation algorithm: c99. To do so we ran the two algorithms on a corpus of twenty two French political discourses and compared their results.
منابع مشابه
How Text Segmentation Algorithms Gain from Topic Models
This paper introduces a general method to incorporate the LDA Topic Model into text segmentation algorithms. We show that semantic information added by Topic Models significantly improves the performance of two wordbased algorithms, namely TextTiling and C99. Additionally, we introduce the new TopicTiling algorithm that is designed to take better advantage of topic information. We show consiste...
متن کاملTopic Segmentation Algorithms for Text Summarization and Passage Retrieval: An Exhaustive Evaluation
In order to solve problems of reliability of systems based on lexical repetition and problems of adaptability of languagedependent systems, we present a context-based topic segmentation system based on a new informative similarity measure based on word co-occurrence. In particular, our evaluation with the state-of-the-art in the domain i.e. the c99 and the TextTiling algorithms shows improved r...
متن کاملFinding Text Boundaries and Finding Topic Boundaries: Two Different Tasks?
The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to...
متن کاملTopic Segmentation with Hybrid Document Indexing
We present a domain-independent unsupervised topic segmentation approach based on hybrid document indexing. Lexical chains have been successfully employed to evaluate lexical cohesion of text segments and to predict topic boundaries. Our approach is based in the notion of semantic cohesion. It uses spectral embedding to estimate semantic association between content nouns over a span of multiple...
متن کاملImproving Text Segmentation with Non-systematic Semantic Relation
Text segmentation is a fundamental problem in natural language processing, which has application in information retrieval, question answering, and text summarization. Almost previous works on unsupervised text segmentation are based on the assumption of lexical cohesion, which is indicated by relations between words in the two units of text. However, they only take into account the reiteration,...
متن کامل