Segmenting Chinese Microtext: Joint Informal-Word Detection and Segmentation with Neural Networks

نویسندگان

  • Meishan Zhang
  • Guohong Fu
  • Nan Yu
چکیده

State-of-the-art Chinese word segmentation systems typically exploit supervised models trained on a standard manually-annotated corpus, achieving performances over 95% on a similar standard testing corpus. However, the performances may drop significantly when the same models are applied onto Chinese microtext. One major challenge is the issue of informal words in the microtext. Previous studies show that informal word detection can be helpful for microtext processing. In this work, we investigate it under the neural setting, by proposing a joint segmentation model that integrates the detection of informal words simultaneously. In addition, we generate training corpus for the joint model by using existing corpus automatically. Experimental results show that the proposed model is highly effective for segmentation of Chinese microtext.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation

We address the problem of informal word recognition in Chinese microblogs. A key problem is the lack of word delimiters in Chinese. We exploit this reliance as an opportunity: recognizing the relation between informal word recognition and Chinese word segmentation, we propose to model the two tasks jointly. Our joint inference method significantly outperforms baseline systems that conduct the t...

متن کامل

Feature-based Neural Language Model and Chinese Word Segmentation

In this paper we introduce a feature-based neural language model, which is trained to estimate the probability of an element given its previous context features. In this way our feature-based language model can learn representation for more sophisticated features. We introduced the deep neural architecture into the Chinese Word Segmentation task. We got a significant improvement on segmenting p...

متن کامل

Chinese Informal Word Normalization: an Experimental Study

We study the linguistic phenomenon of informal words in the domain of Chinese microtext and present a novel method for normalizing Chinese informal words to their formal equivalents. We formalize the task as a classification problem and propose rule-based and statistical features to model three plausible channels that explain the connection between formal and informal pairs. Our two-stage selec...

متن کامل

A Gap-Based Framework for Chinese Word Segmentation via Very Deep Convolutional Networks

Most previous approaches to Chinese word segmentation can be roughly classified into character-based and word-based methods. The former regards this task as a sequence-labeling problem, while the latter directly segments character sequence into words. However, if we consider segmenting a given sentence, the most intuitive idea is to predict whether to segment for each gap between two consecutiv...

متن کامل

An Automated MR Image Segmentation System Using Multi-layer Perceptron Neural Network

Background: Brain tissue segmentation for delineation of 3D anatomical structures from magnetic resonance (MR) images can be used for neuro-degenerative disorders, characterizing morphological differences between subjects based on volumetric analysis of gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF), but only if the obtained segmentation results are correct. Due to image arti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017