Incorporating second-order information into two-step major phrase break prediction for Korean

نویسندگان

  • Seungwon Kim
  • Jinsik Lee
  • Byeongchang Kim
  • Gary Geunbae Lee
چکیده

In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and location of juncture in the sentence), global features (chunk label derived from a eojeol parse tree) and second-order features (distance probability of previous and next phrase break). These three features were combined and used in the experiments, and we were able to generate good performance especially in the major phrase break prediction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TODO: This is a placeholder. Final title will be filled later

In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and...

متن کامل

Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS

This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection, morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector a...

متن کامل

1 0 Ju n 19 98 Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS

This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection, morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector a...

متن کامل

Decision-Tree based Error Correction for Statistical Phrase Break Prediction in Korean

In this paper, we present a new phrase break prediction architecture that integrates probabilistic approach with decision-tree based error correction. The probabilistic method alone usually su ers from performance degradation due to inherent data sparseness problems and it only covers a limited range of contextual information. Moreover, the module can not utilize the selective morpheme tag and ...

متن کامل

Unlimited Vocabulary Grapheme to Phoneme Conversion forKorean

This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection , morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006