نتایج جستجو برای: text segmentation

تعداد نتایج: 227918  

2013
Chuan-Jie Lin Wei-Cheng Chu

This paper describes details of NTOU Chinese spelling check system participating in SIGHAN-7 Bakeoff. The modules in our system include word segmentation, N-gram model probability estimation, similar character replacement, and filtering rules. Three dry runs and three formal runs were submitted, and the best one was created by bigram probability comparison without applying preference and filter...

2017
Yen-Hsuan Lee Han-Yun Yeh Yih-Ru Wang Yuan-Fu Liao

In this paper, a deep phrase embedding approach using bi-directional long shortterm memory (Bi-LSTM) neural networks is proposed to predict the valence-arousal ratings of Chinese phrases. It adopts a Chinese word segmentation frontend, a local order-aware word-, a global phrase-embedding representations and a deep regression neural network (DRNN) model. The performance of the proposed method wa...

2013
Christopher Fournier

This thesis investigates the evaluation of automatic and manual text segmentation. Text segmentation is the process of placing boundaries within text to create segments according to some task-dependent criterion. An example of text segmentation is topical segmentation, which aims to segment a text according to the subjective definition of what constitutes a topic. A number of automatic segmente...

Journal: :Pattern Recognition 2009
Xiaojun Du Wumo Pan Tien D. Bui

Text line segmentation in handwritten documents is an important step in document processing. We present a new text line segmentation method based on the Mumford-Shah model. The algorithm is script independent. In addition, we use morphing to remove overlaps between neighboring text lines and connect broken ones. Experimental results show the validity of our method.

2003
Stefan Agne Andreas Dengel Bertin Klein

The decomposition of a document into segments such as text regions and graphics is a significant part of the document analysis process. The basic requirement for rating and improvement of page segmentation algorithms is systematic evaluation. The approaches known from the literature have the disadvantage that manually generated reference data (zoning ground truth) are needed for the evaluation ...

2003
Datong CHEN

Text characters embedded in images and video sequences represents a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their various sizes, grayscale values and complex backgrounds. This thesis investigates methods for building an efficient application system for detecting and recogn...

1997
TopicJay M. Ponte W. Bruce Croft

We investigate the problem of text segmentation by topic. Applications for this task include topic tracking of broadcast speech data and topic identiication in full-text databases. Researchers have tackled similar problems before but with diierent goals. This study focuses on data with relatively small segment sizes and for which within-segment sentences have relatively few words in common maki...

2001
Masao Utiyama Hitoshi Isahara

We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a state-of-the-art text segmentation system.

2017
Yamen Ajjour Wei-Fan Chen Johannes Kiesel Henning Wachsmuth Benno Stein

The segmentation of an argumentative text into argument units and their nonargumentative counterparts is the first step in identifying the argumentative structure of the text. Despite its importance for argument mining, unit segmentation has been approached only sporadically so far. This paper studies the major parameters of unit segmentation systematically. We explore the effectiveness of vari...

2005
Huipeng Zhang Ting Liu Jinshan Ma Xiantao Liao

This paper presents the results of the system IRLAS from HIT-IRLab in the Second International Chinese Word Segmentation Bakeoff. IRLAS consists of several basic components and multiple postprocessors. The basic components include basic segmentation, factoid recognition, and named entity recognition. These components maintain a segment graph together. The postprocessors include merging of adjoi...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید