Chinese Word Spelling Correction Based on N-gram Ranked Inverted Index List

نویسندگان

  • Jui-Feng Yeh
  • Sheng-Feng Li
  • Mei-Rong Wu
  • Wen-Yi Chen
  • Mao-Chuan Su
چکیده

Spelling correction can assist individuals to input text data with machine using written language to obtain relevant information efficiently and effectively in. By referring to relevant applications such as web search, writing systems, recommend systems, document mining, typos checking before printing is very close to spelling correction. Individuals can input text, keyword, sentence how to interact with an intelligent system according to recommendations of spelling correction. This work presents a novel spelling error detection and correction method based on N-gram ranked inverted index is proposed to achieve this aim, spelling correction. According to the pronunciation and the shape similarity pattern, a dictionary is developed to help detect the possible spelling error detection. The inverted index is used to map the potential spelling error character to the possible corresponding characters either in character or word level. According to the N-gram score, the ranking in the list corresponding to possible character is illustrated. Herein, E-How net is used to be the knowledge representation of tradition Chinese words. The data sets provided by SigHan 7 bakeoff are used to evaluate the proposed method. Experimental results show the proposed methods can achieve accepted performance in subtask one, and outperform other approaches in subtask two.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From Spelling Correction to Text Cleaning - Using Context Information

Spelling correction is the task of correcting words in texts. Most of the available spelling correction tools only work on isolated words and compute a list of spelling suggestions ranked by edit-distance, letter-n-gram similarity or comparable measures. Although the probability of the best ranked suggestion being correct in the current context is high, user intervention is usually necessary to...

متن کامل

Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape

Spelling check is an important preprocessing task when dealing with user generated texts such as tweets and product comments. Compared with some western languages such as English, Chinese spelling check is more complex because there is no word delimiter in Chinese written texts and misspelled characters can only be determined in word level. Our system works as follows. First, we use character-l...

متن کامل

Introduction to BIT Chinese Spelling Correction System at CLP 2014 Bake-off

This paper describes the Chinese spelling correction system submitted by BIT at CLP Bake-off 2014 task 2. The system mainly includes two parts: 1) N-gram model is adopted to retrieve the non-words which are wrongly separated by word segmentation. The non-words are then corrected in terms of word frequency, pronunciation similarity, shape similarity and POS (part of speech) tag. 2) For wrong wor...

متن کامل

GWU-HASP: Hybrid Arabic Spelling and Punctuation Corrector

In this paper, we describe our Hybrid Arabic Spelling and Punctuation Corrector (HASP). HASP was one of the systems participating in the QALB-2014 Shared Task on Arabic Error Correction. The system uses a CRF (Conditional Random Fields) classifier for correcting punctuation errors, an open-source dictionary (or word list) for detecting errors and generating and filtering candidates, an n-gram l...

متن کامل

Design and implementation of Persian spelling detection and correction system based on Semantic

Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors.  Also developing Persian tools will provide Persian progr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013