Myanmar Number Normalization for Text-to-Speech

نویسندگان

  • Aye Mya Hlaing
  • Win Pa Pa
  • Ye Kyaw Thu
چکیده

--Text Normalization is an essential module for Text-to-Speech (TTS) system as TTS systems need to work on real text. This paper describes Myanmar number normalization designed for Myanmar Text-to-Speech system. Semiotic classes forMyanmar language are identified by the study of Myanmar text corpus and Weighted Finite State Transducers (WFST) based Myanmar number normalization is implemented. Number suffixes and prefixes are also applied for token classification and finally, postprocessing has been done for tokens that cannot be classified. This approach achieves average tag accuracy of 93.5% for classification phase and average Word Error Rate (WER) 0.95% for overall performance which is 5.65% lower than rule-based system. The results show that this approach can be used in Myanmar TTS system and to our knowledge, this is the first published work of Myanmar number normalization system designed for Myanmar TTS system. Keywords-Myanmar Number Normalization; Text Normalization; Weighted Finite State Transducer; Myanmar Text-to-Speech; Myanmar;

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual number transcription for text-to-speech conversion

This paper describes the text normalization module of a text to speech fully-trainable conversion system and its application to number transcription. The main target is to generate a language independent text normalization module, based on data instead of on expert rules. This paper proposes a general architecture based on statistical machine translation techniques. This proposal is composed of...

متن کامل

Identification of Adopted Pali Words in Myanmar Text

Myanmar language has been significantly influenced by Pali language due to the practice of Buddhism and study of Buddhist literature in Myanmar. As a result, Pali words have been widely adopted and used in Myanmar language. This study presents an algorithm for identifying Myanmar-adopted Pali words in Myanmar text. The system employs a combination of rule-based syllable segmentation and a dicti...

متن کامل

An Expanded Taxonomy of Semiotic Classes for Text Normalization

We describe an expanded taxonomy of semiotic classes for text normalization, building upon the work in [1]. We add a large number of categories of non-standard words (NSWs) that we believe a robust real-world text normalization system will have to be able to process. Our new categories are based upon empirical findings encountered while building text normalization systems across many languages,...

متن کامل

HMM based myanmar text to speech system

This paper presents a complete statistical speech synthesizer for Myanmar which includes a syllable segmenter, text normalizer, grapheme-to-phoneme convertor, and an HMM-based speech synthesis engine. We believe this is the first such system for the Myanmar language. We performed a thorough human evaluation of the synthesizer relative to human and re-synthesized baselines. Our results show that...

متن کامل

Text Normalization and Unit Selection for a Memory Based Non Uniform Unit Selection TTS in Malayalam

Text to speech synthesis system intended for any language, converts the given text in that language to corresponding speech. The major challenge in TTS system is to generate artificial speech which appears to be natural and intelligible. This is essential for visually impaired people to properly understand and comprehend the generated speech. This paper discuss about text normalization and unit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017