A Simple Approach to Unknown Word Processing in Japanese Morphological Analysis

نویسندگان

  • Ryohei Sasano
  • Sadao Kurohashi
  • Manabu Okumura
چکیده

This paper presents a simple but effective approach to unknown word processing in Japanese morphological analysis, which handles 1) unknown words that are derived from words in a pre-defined lexicon and 2) unknown onomatopoeias. Our approach leverages derivation rules and onomatopoeia patterns, and correctly recognizes certain types of unknown words. Experiments revealed that our approach recognized about 4,500 unknown words in 100,000Web sentences with only 80 harmful side effects and a 6% loss in speed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus-based Japanese morphological analysis

The goal of this study is to improve corpus-based Japanese morphological analysis which is composed by word segmentation and part-of-speech (below POS) tagging. We divide the problem of Japanese morphological analysis into three subproblems: models for known word, models for unknown word and corpus maintenance schema. Firstly, we discuss Markov model-based approaches for known word processing. ...

متن کامل

Automatic Semantic Sequence Extraction from Unrestricted Non-Tagged Texts

Mophological processing, syntactic parsing and other useflfl tools have been proposed in the field of natural language processing(NLP). Many of those NLP tools take dictionary-based approaches. Thus these tools are often not very efficient with texts written in casual wordings or texts which contain m a w domain-specific terms, because of the lack of vocabulary. In this paper we propose a simpl...

متن کامل

The Unknown Word Problem: a Morphological Analysis of Japanese Using Maximum Entropy Aided by a Dictionary

In this paper we describe a morphological analysis method based on a maximum entropy model. This method uses a model that can not only consult a dictionary with a large amount of lexical information but can also identify unknown words by learning certain characteristics. The model has the potential to overcome the unknown word problem.

متن کامل

Chart-driven Connectionist Categorial Parsing of Spoken Korean

While most of the speech and natural language systems which were developed for English and other Indo-European languages neglect the morphological processing and integrate speech and natural language at the word level, for the agglu-tinative languages such as Korean and Japanese, the morphological processing plays a major role in the language processing since these languages have very complex m...

متن کامل

Iranian EFL Learners' Processing of English Derived Words

An interesting area of psycholinguistic inquiry is to discover the way morphological structures are stored in the human mind and how they are retrieved during comprehension or production of language. The current study probed into what goes on in the mind of EFL learners when processing derivational morphology and how English and Persian derivational suffixes are processed. 60 Iranian EFL learne...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013