text segmentation

NTOU Chinese Spelling Check System in SIGHAN Bake-off 2013

2013

Chuan-Jie Lin Wei-Cheng Chu

This paper describes details of NTOU Chinese spelling check system participating in SIGHAN-7 Bakeoff. The modules in our system include word segmentation, N-gram model probability estimation, similar character replacement, and filtering rules. Three dry runs and three formal runs were submitted, and the best one was created by bigram probability comparison without applying preference and filter...

متن کامل

NCTU-NTUT at IJCNLP-2017 Task 2: Deep Phrase Embedding using bi-LSTMs for Valence-Arousal Ratings Prediction of Chinese Phrases

2017

Yen-Hsuan Lee Han-Yun Yeh Yih-Ru Wang Yuan-Fu Liao

In this paper, a deep phrase embedding approach using bi-directional long shortterm memory (Bi-LSTM) neural networks is proposed to predict the valence-arousal ratings of Chinese phrases. It adopts a Chinese word segmentation frontend, a local order-aware word-, a global phrase-embedding representations and a deep regression neural network (DRNN) model. The performance of the proposed method wa...

متن کامل

Evaluating Text Segmentation

2013

Christopher Fournier

This thesis investigates the evaluation of automatic and manual text segmentation. Text segmentation is the process of placing boundaries within text to create segments according to some task-dependent criterion. An example of text segmentation is topical segmentation, which aims to segment a text according to the subjective definition of what constitutes a topic. A number of automatic segmente...

متن کامل

Text line segmentation in handwritten documents using Mumford-Shah model

Journal: :Pattern Recognition 2009

Xiaojun Du Wumo Pan Tien D. Bui

Text line segmentation in handwritten documents is an important step in document processing. We present a new text line segmentation method based on the Mumford-Shah model. The algorithm is script independent. In addition, we use morphing to remove overlaps between neighboring text lines and connect broken ones. Experimental results show the validity of our method.

متن کامل

Evaluating SEE - A Benchmarking System for Document Page Segmentation

2003

Stefan Agne Andreas Dengel Bertin Klein

The decomposition of a document into segments such as text regions and graphics is a significant part of the document analysis process. The basic requirement for rating and improvement of page segmentation algorithms is systematic evaluation. The approaches known from the literature have the disadvantage that manually generated reference data (zoning ground truth) are needed for the evaluation ...

متن کامل

Text Detection and Recognition in Images and Video Sequences

2003

Datong CHEN

Text characters embedded in images and video sequences represents a rich source of information for content-based indexing and retrieval applications. However, these text characters are difficult to be detected and recognized due to their various sizes, grayscale values and complex backgrounds. This thesis investigates methods for building an efficient application system for detecting and recogn...

متن کامل

Text Segmentation by

1997

TopicJay M. Ponte W. Bruce Croft

We investigate the problem of text segmentation by topic. Applications for this task include topic tracking of broadcast speech data and topic identiication in full-text databases. Researchers have tackled similar problems before but with diierent goals. This study focuses on data with relatively small segment sizes and for which within-segment sentences have relatively few words in common maki...

متن کامل

A Statistical Model for Domain-Independent Text Segmentation

2001

Masao Utiyama Hitoshi Isahara

We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require training data because it estimates probabilities from the given text. Therefore, it can be applied to any text in any domain. An experiment showed that the method is more accurate than or at least as accurate as a state-of-the-art text segmentation system.

متن کامل

Unit Segmentation of Argumentative Texts

2017

Yamen Ajjour Wei-Fan Chen Johannes Kiesel Henning Wachsmuth Benno Stein

The segmentation of an argumentative text into argument units and their nonargumentative counterparts is the first step in identifying the argumentative structure of the text. Despite its importance for argument mining, unit segmentation has been approached only sporadically so far. This paper studies the major parameters of unit segmentation systematically. We explore the effectiveness of vari...

متن کامل

Chinese Word Segmentation with Multiple Postprocessors in HIT-IRLab

2005

Huipeng Zhang Ting Liu Jinshan Ma Xiantao Liao

This paper presents the results of the system IRLAS from HIT-IRLab in the Second International Chinese Word Segmentation Bakeoff. IRLAS consists of several basic components and multiple postprocessors. The basic components include basic segmentation, factoid recognition, and named entity recognition. These components maintain a segment graph together. The postprocessors include merging of adjoi...

متن کامل