Intonational phrase break prediction using decision tree and n-gram model
نویسندگان
چکیده
In the current study, we propose and evaluate a new method for automatic intonational phrase break prediction based on sequences of parts-of-speech and word junctures. The proposed method uses decision trees to estimate the probability of a word juncture type (break or non-break) given a finite length window of part-of-speech values, and uses an n-gram to model the word juncture sequence. Trained on an 8,000 word database, our algorithm predicted breaks with F=77% and non-breaks with F=93%, which represents a significant improvement over the commonly used approach, which uses decision trees alone.
منابع مشابه
Chinese prosody phrase break prediction based on maximum entropy model
A maximum entropy based model for prosody phrase break prediction was proposed in this paper, and a comparison was conducted on large corpora between the new model and the decision tree based model which was the mainstream method for prosody phrase break prediction. The contribution of lexical information and influences of different cutoff values were also investigated. It was demonstrated that...
متن کاملDecision-Tree based Error Correction for Statistical Phrase Break Prediction in Korean
In this paper, we present a new phrase break prediction architecture that integrates probabilistic approach with decision-tree based error correction. The probabilistic method alone usually su ers from performance degradation due to inherent data sparseness problems and it only covers a limited range of contextual information. Moreover, the module can not utilize the selective morpheme tag and ...
متن کاملAutomatic Classi cation of Intonational Phrase Boundaries
The relationship between the intonational characteristics of an utterance and other features inferable from its text represents an important source of information both for speech recognition, to constrain the set of allowable hypotheses, and for speech synthesis, to assign intonational features appropriately from text. This work investigates the usefulness of a number of textual features and ad...
متن کاملLearning methods and features for corpus-based phrase break prediction on Thai
This paper presents applications of five famous learning methods for Thai phrase break prediction. Phrase break prediction is particularly important for our Thai text-to-speech synthesizer (TTS), where input Thai text has no word and sentence boundary. The learning methods include a POS sequence model, CART, RIPPER, SLIPPER and neural network. Features proposed for the learning machines can be ...
متن کاملPredicting Intonational Phrasing from Text
Determining the relationship between the intonational characteristics of an utterance and other features inferable from its text is important both for speech recognition and for speech synthesis. This work investigates the use of text analysis in predicting the location of intonational phrase boundaries in natural speech, through analyzing 298 utterances from the DARPA Air Travel Information Se...
متن کامل