نتایج جستجو برای: text segmentation

تعداد نتایج: 227918  

2012
Gisela Redeker Ildikó Berzlánovich Nynke van der Vliet Gosse Bouma Markus Egg

We have compiled a corpus of 80 Dutch texts from expository and persuasive genres, which we annotated for rhetorical and genre-specific discourse structure, and lexical cohesion with the goal of creating a gold standard for further research. The annotations are based on a segmentation of the text in elementary discourse units that takes into account cues from syntax and punctuation. During the ...

2014
Abdessalam Bouchekif Géraldine Damnati Delphine Charlet

In this paper, we introduce the notion of speech cohesion for topic segmentation of a spoken content. The aim is to integrate speaker information and lexical information within a single cohesion value. Based on a lexical cohesion system, we propose an approach that directly integrates the speaker distribution when processing the cohesion. A potential boundary is effective if the joint distribut...

2008
Alexandre Labadié Violaine Prince

The goal of this paper is to demonstrate that usual evaluation methods for text segmentation are not adapted for every task linked to text segmentation. To do so we differentiated the task of finding text boundaries in a corpus of concatenated texts from the task of finding transitions between topics inside the same text. We worked on a corpus of twenty two French political discourses trying to...

2010
Darko Brodic

In this paper, extended approach to Gaussian kernel algorithm for text segmentation, reference text line and skew rate extractions is presented. It assumes creation of boundary growing area around text based on Gaussian kernel algorithm extended by anisotropic approach. Those boundary growing areas form control image with distinct objects that are prerequisite for text segmentation. After text ...

2013
SUNANDA DIXIT

In document image analysis, Text line segmentation is one of the key components. The segmentation logic presents essential information about skew correction, zone segmentation, and character recognition. The method of document image segmentation into text lines for printed text has seen numerous contributions from fellow research scholars, yet there is scope for tremendous improvement. The key ...

Journal: :CoRR 2010
Mihaiela Lupea Doina Tatar Zsuzsana Marian

In this paper the problems of deriving a taxonomy from a text and concept-oriented text segmentation are approached. Formal Concept Analysis (FCA) method is applied to solve both of these linguistic problems. The proposed segmentation method offers a conceptual view for text segmentation, using a context-driven clustering of sentences. The Concept-oriented Clustering Segmentation algorithm (COC...

2011
Syed Saqib Bukhari Faisal Shafait Thomas M. Breuel

Page segmentation into text and non-text components is an essential preprocessing step before OCR operation. If this is not done properly, an OCR classification engine produces garbage text due to the presence of nontext components. This paper describes improvements to the text/image segmentation algorithm described by Bloomberg, which is also available in his open-source Leptonica library. The...

2013
Robert Daland Kie Zuraw

Computational models of infant word segmentation have not been tested on a wide range of languages. This paper applies a phonotactic segmentation model to Korean. In contrast to the undersegmentation pattern previously found in English and Russian, the model exhibited more oversegmentation errors and more errors overall. Despite the high error rate, analysis suggested that lexical acquisition m...

Journal: :Computational Linguistics 2002
Lev Pevzner Marti A. Hearst

The Pk evaluation metric, initially proposed by Beeferman et al. 1997, is becoming the standard measure for assessing text segmentation algorithms. However, a theoretical analysis of the metric finds several problems: the metric penalizes false negatives more heavily than false positives, over-penalizes near-misses, and is affected by variation in segment size distribution. We propose a simple ...

2003
Natsuo Yamamoto Jun Ogata Yasuo Ariki

In this paper, we propose a segmentation method of continuous lecture speech into topics. A lecture includes several topics but it is difficult to judge their boundaries. To solve this problem, transcriptions obtained by spontaneous speech recognition of a lecture speech is associated with the textbook used in the lecture. This method showed high performance of the topic segmentation with an av...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید