نتایج جستجو برای: text segmentation

تعداد نتایج: 227918  

2003
Dimosthenis A. Karatzas

The research presented in this thesis addresses the problem of Text Segmentation in Web images. Text is routinely created in image form (headers, banners etc.) on Web pages, as an attempt to overcome the stylistic limitations of HTML. This text however, has a potentially high semantic value in terms of indexing and searching for the corresponding Web pages. As current search engine technology d...

2004
Jean-François Pessiot Marc Caillet Massih-Reza Amini Patrick Gallinari

In this paper we introduce a machine learning approach for automatic text segmentation. Our text segmenter clusters text-segments containing similar concepts. It first discovers the different concepts present in a text, each concept being defined as a set of representative terms. After that the text is partitioned into coherent paragraphs using a hard clustering technique based on the Classific...

2015
Okko Johannes Räsänen Heikki Rasilo

Existing models of infant word learning have mainly assumed that the learner is capable of segmenting words from speech before grounding them to their referential meaning, while segmentation itself has been treated relatively independently of meaning acquisition. In this paper, we argue that situated cues such as visually perceived concrete objects or actions are not just important for word-to-...

2017
Zuyi Bao Si Li Sheng Gao Weiran Xu

There has a large scale annotated newswire data for Chinese word segmentation. However, some research proves that the performance of the segmenter has significant decrease when applying the model trained on the newswire to other domain, such as patent and literature. The same character appeared in different words may be in different position and with different meaning. In this paper, we introdu...

2005
Chunyu Kit Xiaoyue Liu

This paper reports the example-based segmentation system for our participation in the second Chinese Word Segmentation Bakeoff (CWSB-2), presenting its basic ideas, technical details and evaluation. It is a preliminary implementation. CWSB-2 valuation shows that it performs very well in identifying known words. Its unknown word detection module also illustrates great potential. However, proper ...

Journal: :AI Commun. 2004
Nicola Stokes Joe Carthy Alan F. Smeaton

In this paper we compare the performance of three distinct approaches to lexical cohesion based text segmentation. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e. distinct news stories from broadcast news programmes. Our approach to news s...

Journal: :Journal of Chinese Language and Computing 2005
Gulila Altenbek

Xinjiang multi-nationality name entity recognition is an important part in multi-language processing. In this paper, we analyze the patterns of Uighur and Kazak person names, and perform the name identity recognition using rule-based approach. We also propose and implement the rules for Uighur and Kazak word segmentation.

2008
Jia-Lin Tsai

This paper describes a Chinese word segmentation system based on word boundary token model and triple template matching model for extracting unknown words; and word support model for resolving segmentation ambiguity.

1996
Carl de Marcken

This paper discusses the problem of learning language from unprocessed text and speech signals, concentrating on the problem of learning a lexicon. In particular, it argues for a representation of language in which linguistic parameters like words are built by perturbing a composition of existing parameters. The power of this representation is demonstrated by several examples in text segmentati...

1995
Fumitaka Kimura Yasuji Miyake Malayappan Shridhar

This paper describes a new approach to ZIP code recognition using a word recognition algorithm, where a numeral string is recognized as a word. This paper also describes an end to end ZIP code recognition system consisting of tiltlslant correction, line segmentation, word segmentation, ZIP code location, as well as the ZIP code recognition. Evaluation tests are performed using address block ima...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید