نتایج جستجو برای: text segmentation

تعداد نتایج: 227918  

2011
Matthew Purver

This chapter discusses the task of topic segmentation: automatically dividing single long recordings or transcripts into shorter, topically coherent segments. First, we look at the task itself, the applications which require it, and some ways to evaluate accuracy. We then explain the most influential approaches – generative and discriminative, supervised and unsupervised – and discuss their app...

2012
Uma Devi

Text line segmentation is an inherent part of document recognition system and important preprocessing step for word and character segmentation. Presence of touching or overlapping text lines, short-lines, curvilinear or skewed lines and small or variant gaps between the text lines make the segmentation challenging. These variations cause errors in recognition phase. This paper describes the top...

2008
Doina Tatar Andreea Diana Mihis Gabriela Serban Czibula

The paper proposes a new method of linear text segmentation based on lexical cohesion of a text. Namely, first a single chain of disambiguated words in a text is established, then the rips of this single chain are considered as boundaries for the segments of the cohesion text structure (Cohesion TextTiling or CTT). The summaries of arbitrarily length are obtained by extraction using three diffe...

2016
Jayant Kumar Laurence Likforman-Sulem Syed Saqib Bukhari

Text line segmentation is an inherent part of document recognition system and important preprocessing step for word and character segmentation. Presence of touching or overlapping text lines, short-lines, curvilinear or skewed lines and small or variant gaps between the text lines make the segmentation challenging. These variations cause errors in recognition phase. This paper describes the top...

2002
Matthias Zimmermann Horst Bunke

This paper presents an automatic segmentation scheme for cursive handwritten text lines using the transcriptions of the text lines and a hidden Markov model (HMM) based recognition system. The segmentation scheme has been developed and tested on the IAM database that contains offline images of cursively handwritten English text. The original version of this database contains ground truth for co...

2016
Amrith Krishna Bishal Santra Pavankumar Satuluri Sasi Prasanth Bandaru Bhumi Faldu Yajuvendra Singh Pawan Goyal

In Sanskrit, the phonemes at the word boundaries undergo changes to form new phonemes through a process called as sandhi. A fused sentence can be segmented into multiple possible segmentations. We propose a word segmentation approach that predicts the most semantically valid segmentation for a given sentence. We treat the problem as a query expansion problem and use the path-constrained random ...

2008
Hai Zhao Chunyu Kit

This paper reports our empirical evaluation and comparison of several popular goodness measures for unsupervised segmentation of Chinese texts using Bakeoff-3 data sets with a unified framework. Assuming no prior knowledge about Chinese, this framework relies on a goodness measure to identify word candidates from unlabeled texts and then applies a generalized decoding algorithm to find the opti...

2013
Abdellah Fourtassi Benjamin Börschinger Mark Johnson Emmanuel Dupoux

Cross-linguistic studies on unsupervised word segmentation have consistently shown that English is easier to segment than other languages. In this paper, we propose an explanation of this finding based on the notion of segmentation ambiguity. We show that English has a very low segmentation ambiguity compared to Japanese and that this difference correlates with the segmentation performance in a...

1989
Anne Cutler Sally Butterfield

One of a listener's major tasks in understanding continuous speech is segmenting the speech signal into separate words. When listening conditions are difficult, speakers can help listeners by deliberately speaking more clearly. In three experiments, we examined how word boundaries are produced in deliberately clear speech. We found that speakers do indeed attempt to mark word boundaries; moreov...

2012
P. Galuščáková

Segmentation into topically coherent segments is one of the crucial points in information retrieval (IR). Suitable segmentation may improve the results of IR system and help users to find relevant passages faster. Segmentation is especially important in audiovisual recordings, in which the navigation is difficult. We present several methods used for topic segmentation, based on textual, audio a...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید