Improving the Accuracy of the Speech Synthesis Based Phonetic Alignment Using Multiple Acoustic Features
نویسندگان
چکیده
The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.
منابع مشابه
طراحی الگوریتم بازشناسی واجها با به کارگیری همبسته های آکوستیکی مشخصه های واجی
In the present paper, the phonological feature geometry of the Persian phonemes is analyzed in the form of articulate-free and articulate-bound features based on the articulator model of the nonlinear phonology. Then, the reference phonetic pattern of each feature that consists of one or a set of acoustic correlates, characterized by the quantitative or qualitative values in its phonological re...
متن کاملDTW-based phonetic alignment using multiple acoustic features
This paper presents the results of our effort in improving the accuracy of a DTW-based automatic phonetic aligner. The adopted model assumes that the phonetic segment sequence is already known and so the goal is only to align the spoken utterance with a reference synthetic signal produced by waveform concatenation without prosodic modifications. Instead of using a single acoustic measure to com...
متن کاملHighly accurate phonetic segmentation using boundary correction models and system fusion
Accurate phone-level segmentation of speech remains an important task for many subfields of speech research. We investigate techniques for boosting the accuracy of automatic phonetic segmentation based on HMM acoustic-phonetic models. In prior work [25] we were able to improve on state-of-the-art alignment accuracy by employing special phone boundary HMM models, trained on phonetically segmente...
متن کاملAreal and Phylogenetic Features for Multilingual Speech Synthesis
We introduce phylogenetic and areal language features to the domain of multilingual text-to-speech synthesis. Intuitively, enriching the existing universal phonetic features with crosslingual shared representations should benefit the multilingual acoustic models and help to address issues like data scarcity for low-resource languages. We investigate these representations using the acoustic mode...
متن کاملAcoustic measures vs. phonetic features as predictors of audible discontinuity in concatenative speech synthesis
Most concatenative speech synthesizers employ both acoustic measures and phonetic features to predict the perceptual damage caused by concatenating two waveform segments because no reliable acoustic measure has been found so far. This paper compares the predicting ability of the two kinds of predictor variables. We first conduct a perceptual experiment to measure the naturalness degradation due...
متن کامل