نتایج جستجو برای: text segmentation

تعداد نتایج: 227918  

2012
Julian Brooke Adam Hammond Graeme Hirst

The identification of stylistic inconsistency is a challenging task relevant to a number of genres, including literature. In this work, we carry out stylistic segmentation of a well-known poem, The Waste Land by T.S. Eliot, which is traditionally analyzed in terms of numerous voices which appear throughout the text. Our method, adapted from work in topic segmentation and plagiarism detection, p...

2010
Le Sun Keh-Jiann Chen Qun Liu

The authors propose that we need somechange for the current technology inChinese word segmentation. We shouldhave separate and different phases in theso-called segmentation. First of all, weneed to limit segmentation only to thesegmentation of Chinese characters in-stead of the so-called Chinese words. Incharacter segmentation, we will extractall the informat...

1998
Julia Hirschberg Christine H. Nakatani

The segmentation of text and speech into topics and subtopics is an important step in document interpretation. For text, formatting information, such as headings and paragraphing, is available to aid in this endeavor, although this information is by no means su cient. For speech, the task is even more di cult. We present results of the application of machine learning techniques to the automatic...

2016
Takashi Yamamura Kazutaka Shimada Shintaro Kawahara

Summarization of multi-party conversation is one of the important tasks in natural language processing. In this paper, we explain a Japanese corpus and a topic segmentation task. To the best of our knowledge, the corpus is the first Japanese corpus annotated for summarization tasks and freely available to anyone. We call it “the Kyutech corpus.” The task of the corpus is a decision-making task ...

2006
Aaron J. Jacobs Yuk Wah Wong

We extended the work of Low, Ng, and Guo (2005) to create a Chinese word segmentation system based upon a maximum entropy statistical model. This system was entered into the Third International Chinese Language Processing Bakeoff and evaluated on all four corpora in their respective open tracks. Our system achieved the highest F-score for the UPUC corpus, and the second, third, and seventh high...

2002
Olivier Ferret

We present in this paper a method for achieving in an integrated way two tasks of topic analysis: segmentation and link detection. This method combines word repetition and the lexical cohesion stated by a collocation network to compensate for the respective weaknesses of the two approaches. We report an evaluation of our method for segmentation on two corpora, one in French and one in English, ...

2005
Toshiaki Nakazawa Daisuke Kawahara Sadao Kurohashi

Katakana, Japanese phonogram mainly used for loan words, is a trou-blemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana compounds, which makes it possible to cons...

1999
Stefan Kaufmann

Collocational word similarity is considered a source of text cohesion that is hard to measure and quantify. The work presented here explores the use of information from a training corpus in measuring word similarity and evaluates the method in the text segmentation task. An implementation, the VecTile system, produces similarity curves over texts using pre-compiled vector representations of the...

2005
Jane S. Tsay

Taiwan Child Language Corpus contains scripts transcribed from about 330 hours of recordings of fourteen young children from Southern Min Chinese speaking families in Taiwan. The format of the corpus adopts the Child Language Data Exchange System (CHILDES). The size of the corpus is about 1.6 million words. In this paper, we describe data collection, transcription, word segmentation, and part-o...

Journal: :Computational Linguistics 2009
Zhongguo Li Maosong Sun

Paragraphs are composed of sentences. Hence when a paragraph begins, a sentence must begin, and as a paragraph closes, some sentence must finish. This observation is the basis of the sentence boundary detection method proposed by Riley (1989). Similarly, sentences consist of words. As a sentence begins or ends there must be word boundaries. Inspired by this notion, we invent a method to learn a...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید