Corpus-Based Unit Selection TTS for Hungarian
نویسندگان
چکیده
This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-tospeech (TTS) system for Hungarian. The experimental system generates weather forecasts in Hungarian. 5260 sentences were recorded creating a speech corpus containing 11 hours of continuous speech. A Hungarian speech recognizer was applied to label speech sound boundaries. Word boundaries were also marked automatically. The unit selection follows a top-down hierarchical scheme using words and speech sounds as units. A simple prosody model is used, based on the relative position of words within a prosodic phrase. The quality of the system was compared to two earlier Hungarian TTS systems. A subjective listening test was performed by 221 listeners. The experimental system scored 3.92 on a fivepoint mean opinion score (MOS) scale. The earlier unit concatenation TTS system scored 2.63, the formant synthesizer scored 1.24, and natural
منابع مشابه
Efficient and Scalable Met Generation in Corpus-b
This paper proposes performance indices and search criteria for the text script generation in the design of corpus-based TTS systems. Based on the criteria a new search method is presented to solve the text selection problem more systematically and efficiently. Experiment results have shown that with the same hit rate of unit types the new method can reduce up to 40% of text script size in some...
متن کاملCreating German unit selection voices for the MARY TTS platform from the BITS corpora
The present paper reports on the creation of German unit selection voices from corpora which had been recorded and annotated previously in the BITS project. We describe the unit selection mechanism of our MARY TTS platform, as well as the tools for creating a synthesis voice from a speech corpus, and their application to the creation of German unit selection voices from the BITS corpora. Becaus...
متن کاملOn the Suitability of Vocalic Sandwiches in a Corpus-Based TTS Engine
Unit selection speech synthesis systems generally rely on target and concatenation costs for selecting the best unit sequence. The role of the concatenation cost is to insure that joining two voice segments will not cause any acoustic artefact to appear. For this task, acoustic distances (MFCC, F0) are typically used but in many cases, this is not enough to prevent concatenation artefacts. Amon...
متن کاملHansori 2001 - corpus-based implementation of the Korean hansori text-to-speech synthesizer
The improvement of Text-to-Speech (TTS) synthesizers’ speech quality and naturalness is a continuous concern of researchers worldwide. The present paper gives a brief introduction of several previous Hansori TTS systems and is introducing our approach on experimenting, adopting and implementing corpus-based techniques for the system. We are focusing on corpus selection, on the optimal unit sear...
متن کاملHigh-Quality and Flexible Speech Synthesis with Segment Selection and Voice Conversion
Text-to-Speech (TTS) is a useful technology that converts any text into a speech signal. It can be utilized for various purposes, e.g. car navigation, announcements in railway stations, response services in telecommunications, and e-mail reading. Corpus-based TTS makes it possible to dramatically improve the naturalness of synthetic speech compared with the early TTS. However, no general-purpos...
متن کامل