A Pointwise Approach to Pronunciation Estimation for a TTS Front-End
نویسندگان
چکیده
In this paper, we propose a pointwise approach to the Japanese TTS front-end. In this approach, phoneme sequence estimation of sentences is decomposed into two tasks: word segmentation of the input sentence and phoneme estimation of each word. Then these two tasks are solved by pointwise classifiers without referring to the neighboring classification results. In contrast to existing sequence-based methods, an n-gram model based on sequences of word-phoneme pairs for example, this framework enables us to use various language resources such as sentences in which only a few words are annotated, or an unsegmented list of compound words, among others. In the experiments, we compared a joint tri-gram model with the combination of a pointwise word segmenter and a pointwise phoneme sequence estimator. The results showed that our framework successfully enables a TTS front-end to refer to a partially annotated corpus and/or a word sequence list annotated with phoneme sequences to realize a far larger improvement in accuracy.
منابع مشابه
A general approach to TTS reading of mixed-language texts
The paper presents the Loquendo TTS approach to mixedlanguage speech synthesis, offering a range of options to face the various situations where texts may occur in different languages or embedding foreign phrases. The most challenging target is to make a monolingual TTS voice read a foreign language text. The adopted Foreign Pronunciation Strategy here discussed allows mixing phonetic transcrip...
متن کاملImproving TTS by higher agreement between predicted versus observed pronunciations
This paper looks at improving unit selection text-to-speech (TTS) quality by optimizing the agreement between frontend and speech database. We focused, in particular, on two classes of problems causing degradation in synthesis quality: 1) realization of /d/ and /t/1 sounds and 2) confusions of unstressed vowels, especially with schwas. We investigated two approaches to tackling these problems. ...
متن کاملImproving the accuracy of pronunciation lexicon using Naive Bayes classifier with character n-gram as feature: for language classified pronunciation lexicon generation
This paper looks at improving the accuracy of pronunciation lexicon for Malayalam by improving the quality of front end processing. Pronunciation lexicon is an in evitable component in speech research and speech applications like TTS and ASR. This paper details the work done to improve the accuracy of automatic pronunciation lexicon generator (APLG) with Naive Bayes classifier using character n...
متن کاملExtracting word-pronunciation pairs from comparable set of text and speech
One of the problems in text-to-speech (TTS) systems and speech-to-text (STT) systems is pronunciation estimation of unknown words. In this paper, we propose a method for extracting unknown words and their pronunciations from similar sets of Japanese text data and speech data. Out-of-vocabulary words are extracted from text with a stochastic model and pronunciations hypotheses are generated. The...
متن کاملThe Polysemy Problem, an Important Issue in a Chinese to Taiwanese TTS System
This paper brings up an important issue, polysemy problems, in a Chinese to Taiwanese TTS (text-to-speech) system. Polysemy means there are words with more than one meaning or pronunciation, such as “我們” (we), “不” (no), “你” (you), “我” (I), and “要” (want). We first will show the importance of the polysemy problem in a Chinese to Taiwanese (C2T) TTS system. Then, we will propose some approaches t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011