نتایج جستجو برای: text segmentation

تعداد نتایج: 227918  

2010
Martin Scaiano Diana Inkpen Robert Laganière Adele Reinhartz

To improve information retrieval from films we attempt to segment movies into scenes using the subtitles. Film subtitles differ significantly in nature from other texts; we describe some of the challenges of working with movie subtitles. We test a few modifications to the TextTiling algorithm, in order to get an effective segmentation.

2015
Huidan Liu Congjun Long Minghua Nuo Jian Wu

When Tibetan word segmentation task is taken as a sequence labelling problem, machine learning models such as ME and CRFs can be used to train the segmenter. The performance of the segmenter is related to many factors. In the paper, three factors, namely strategy on abbreviated syllables, tag set, and the syllable’s Part-Of-Speech property, are compared. Experiment data show that: first, if eac...

2008
Rui Amaral Isabel Trancoso

The goal of this paper is the description of our current work in terms of topic segmentation and indexation and the comparison of their performance with the story boundaries and topics manually chosen by a professional media watch company. The segmentation module explores the typical structure of a broadcast news show, namely by cues provided by the audio pre-processing module, but an improved ...

2011
Elliott Moreton Joe Pater Anne Pycha Jen Smith Colin Wilson

(3) An example of simplicity bias in learning Saffran and Thiessen’s (2003) phonotactic learning/word segmentation study: a. Training Phase 1. 9-month-olds exposed to isolated words of shape C1V C2.C1V C2, where C1 and C2 are each limited to a set of three consonants. b. Training Phase 2. Exposed to 4 new words in a continuous stream, with only two fitting the pattern from Phase 1. c. Test Phas...

2001
M. Oguzhan Külekci Mehmed Özkan

This paper describes an algorithm to segment an input Turkish string without any spaces, which may be an output of a speech-to-text application, into words by using morphological analyzer. It is quite possible to use the algorithm on other languages, which has a morphological analysis component, as well. Turkish morphological analyzer is designed and implemented as the linguistic engine of the ...

2006
Jaime Arguello Carolyn Penstein Rosé

We introduce a novel topic segmentation approach that combines evidence of topic shifts from lexical cohesion with linguistic evidence such as syntactically distinct features of segment initial and final contributions. Our evaluation shows that this hybrid approach outperforms state-of-the-art algorithms even when applied to loosely structured, spontaneous dialogue. Further analysis reveals tha...

Journal: :J. UCS 2008
Sylvain Lamprier Tassadit Amghar Bernard Levrat Frédéric Saubion

The thematic text segmentation task consists in identifying the most important thematic breaks in a document in order to cut it into homogeneous passages. We propose in this paper an algorithm for linear text segmentation on general corpuses. It relies on an initial clustering of the sentences of the text. This preliminary partitioning provides a global view on the sentences relations existing ...

Journal: :CoRR 2014
Valery D. Solovyev Vladimir V. Bochkarev

The article describes the original method of creating a dictionary of abbreviations based on the Google Books Ngram Corpus. The dictionary of abbreviations is designed for Russian, yet as its methodology is universal it can be applied to any language. The dictionary can be used to define the function of the period during text segmentation in various applied systems of text processing. The artic...

2008
Maria Georgescul Alexander Clark Susan Armstrong

In this article we address the task of automatic text structuring into linear and nonoverlapping thematic episodes at a coarse level of granularity. In particular, we deal with topic segmentation on multi-party meeting recording transcripts, which pose specific challenges for topic segmentation models. We present a comparative study of two probabilistic mixture models. Based on lexical features...

2005
Dmitriy Genzel

We propose and motivate a novel task: paragraph segmentation. We discuss and compare this task with text segmentation and discourse parsing. We present a system that performs the task with high accuracy. A variety of features is proposed and examined in detail. The best models turn out to include lexical, coherence, and structural features.

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید