نتایج جستجو برای: text segmentation
تعداد نتایج: 227918 فیلتر نتایج به سال:
Automatic term extraction is the first step towards automatic or semi-automatic update of existing domain knowledge base. Most of the researches applied word segmentation as a preprocessing step to Chinese term extraction. However, segmentation ambiguity is unavoidable, especially in identifying unknown words for Chinese. In this paper, we discuss the effect and limitations of segmentation to C...
Word segmentation has been shown helpful for Chinese-toEnglish machine translation (MT), yet the way different segmentation strategies affect MT is poorly understood. In this paper, we focus on comparing different segmentation strategies in terms of machine translation quality. Our empirical study covers both English-to-Chinese and Chinese-to-English translation for the first time. Our results ...
Gurumukhi script is a two dimensional composition of symbols with connected and disconnected diacritics. Handwritten Gurumukhi script has some complexities like connected, overlapped text lines. It is one of the major reasons for errors during the recognition process. Text line segmentation is a challenging job in unconstrained writer independent handwritten document image processing. There is ...
Term Contributed Boundary Feature using Conditional Random Fields for Chinese Word Segmentation Task
This paper proposes a novel feature for conditional random field (CRF) model in Chinese word segmentation system. The system uses a conditional random field as machine learning model with one simple feature called term contributed boundaries (TCB) in addition to the “BIEO” character-based label scheme. TCB can be extracted from unlabeled corpora automatically, and segmentation variations of dif...
Given a text, can we segment it into semantically coherent sections in an automatic way? Can detect the semantic boundaries, if know how many they are? determine distinct are text? These questions address this paper. To respond, use Bidirectional Encoder Representation from Transformer (BERT) to analyze text and evaluate function that call local incoherence, which expect show maxima at points w...
Text is commonly printed on a complex background. Segmenting text is an important part in document analysis. In the past some methods have been shown for the segmentation of texts with images. However, previous studies have not sufficiently addressed complex compound documents. This investigation presents an algorithm for the segmentation of text in various document images. The proposed segment...
This paper describes a method of on-line handwritten Japanese text recognition by improving segmentation quality. The method produces hypothetical segmentation points according to features such as distance and overlap between adjacent strokes. Moreover, it extracts multidimensional features from these hypothetical segmentation points and applies an SVM to the extracted features to produces segm...
Optical Character Recognition (OCR) systems have been effectively developed for the recognition of printed script. The accuracy of OCR system mainly depends on the text preprocessing and segmentation algorithm being used. When the document is scanned it can be placed in any arbitrary angle which would appear on the computer monitor at the same angle. This paper addresses the algorithm for corre...
This paper describes several text independent speech segmentation methods. State-of-the-art applications and the prospected use of automatic speech segmentation techniques are presented, including the direct applicability of automatic segmentation in recognition, coding and speech corpora annotation, which is a central issue in todays speech technology. Moreover, a novel parametric segmentatio...
نمودار تعداد نتایج جستجو در هر سال
با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید