Incorporating second-order information into two-step major phrase break prediction for Korean
نویسندگان
چکیده
In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and location of juncture in the sentence), global features (chunk label derived from a eojeol parse tree) and second-order features (distance probability of previous and next phrase break). These three features were combined and used in the experiments, and we were able to generate good performance especially in the major phrase break prediction.
منابع مشابه
TODO: This is a placeholder. Final title will be filled later
In this paper, we present a new phrase break prediction method that integrates second-order information into general maximum entropy model. The phrase break prediction problem was mapped into a classification problem in our research. The features we used for the prediction of phrase breaks are of several layers such as local features (part-of-speech (POS) tags, a lexicon, lengths of eojeols and...
متن کاملUnlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS
This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection, morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector a...
متن کامل1 0 Ju n 19 98 Unlimited Vocabulary Grapheme to Phoneme Conversion for Korean TTS
This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection, morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector a...
متن کاملDecision-Tree based Error Correction for Statistical Phrase Break Prediction in Korean
In this paper, we present a new phrase break prediction architecture that integrates probabilistic approach with decision-tree based error correction. The probabilistic method alone usually su ers from performance degradation due to inherent data sparseness problems and it only covers a limited range of contextual information. Moreover, the module can not utilize the selective morpheme tag and ...
متن کاملUnlimited Vocabulary Grapheme to Phoneme Conversion forKorean
This paper describes a grapheme-to-phoneme conversion method using phoneme connectivity and CCV conversion rules. The method consists of mainly four modules including morpheme normalization, phrase-break detection , morpheme to phoneme conversion and phoneme connectivity check. The morpheme normalization is to replace non-Korean symbols into standard Korean graphemes. The phrase-break detector ...
متن کامل