Incorporating second-order information into two-step major phrase break prediction for Korean

نویسندگان

Seungwon Kim

Jinsik Lee

Byeongchang Kim

Gary Geunbae Lee

چکیده

In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and location of juncture in the sentence), global features (chunk label derived from a eojeol parse tree) and second-order features (distance probability of previous and next phrase break). These three features were combined and used in the experiments, and we were able to generate good performance especially in the major phrase break prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TODO: This is a placeholder. Final title will be filled later

متن کامل

Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS

This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection, morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector a...

متن کامل

1 0 Ju n 19 98 Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS

متن کامل

Decision-Tree based Error Correction for Statistical Phrase Break Prediction in Korean

In this paper, we present a new phrase break prediction architecture that integrates probabilistic approach with decision-tree based error correction. The probabilistic method alone usually su ers from performance degradation due to inherent data sparseness problems and it only covers a limited range of contextual information. Moreover, the module can not utilize the selective morpheme tag and ...

متن کامل

Unlimited Vocabulary Grapheme to Phoneme Conversion forKorean

This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection , morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Incorporating second-order information into two-step major phrase break prediction for Korean

نویسندگان

چکیده

منابع مشابه

TODO: This is a placeholder. Final title will be filled later

Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS

1 0 Ju n 19 98 Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS

Decision-Tree based Error Correction for Statistical Phrase Break Prediction in Korean

Unlimited Vocabulary Grapheme to Phoneme Conversion forKorean

عنوان ژورنال:

اشتراک گذاری