Discovering phonetic inventories with crosslingual automatic speech recognition
نویسندگان
چکیده
The high cost of data acquisition makes Automatic Speech Recognition (ASR) model training problematic for most existing languages, including languages that do not even have a written script, or which the phone inventories remain unknown. Past works explored multilingual training, transfer learning, as well zero-shot learning in order to build ASR systems these low-resource languages. While it has been shown pooling resources from multiple is helpful, we yet seen successful application an language unseen during training. A crucial step adaptation creation inventory language. ultimate goal our work unsupervised way without any knowledge about In this paper, (1) investigate influence different factors (i.e., architecture, phonotactic model, type speech representation) on recognition unknown language; (2) provide analysis phones across and understand limitations areas further improvement automatic creation; (3) present methods way. To end, conducted mono-, multi-, crosslingual experiments set 13 phonetically diverse several in-depth analyses. We found number universal tokens (IPA symbols) are well-recognized cross-linguistically. Through detailed results, conclude unique sounds, similar tone major challenge phonetic discovery.
منابع مشابه
Multilingual and Crosslingual Speech Recognition
This paper describes the design of a multilingual speech recognizer using an LVCSR dictation database which has been collected under the project GlobalPhone. This project at the University of Karlsruhe investigates LVCSR systems in 15 languages of the world, namely Arabic, Chinese, Croatian, English, French, German, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, Tamil, and Tu...
متن کاملAnalysis of phonetic transcriptions for Danish automatic speech recognition
Automatic speech recognition (ASR) relies on three resources: audio, orthographic transcriptions and a pronunciation dictionary. The dictionary or lexicon maps orthographic words to sequences of phones or phonemes that represent the pronunciation of the corresponding word. The quality of a speech recognition system depends heavily on the dictionary and the transcriptions therein. This paper pre...
متن کاملLexical and phonetic modeling for Arabic automatic speech recognition
In this paper, we describe the use of either words or morphemes as lexical modeling units and the use of either graphemes or phonemes as phoneticmodeling units for Arabic automatic speech recognition (ASR). We designed four Arabic ASR systems: two word-based systems and two morpheme-based systems. Experimental results using these four systems show that they have comparable state-of-the-art perf...
متن کاملAutomatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
متن کامل
Speech recognition with automatic punctuation
We present a method of speech recognition with automatic punctuation based on a combination of acoustic and lexical evidence. In the recognizer vocabulary, punctuation marks are treated as word entries. By assigning the acoustic baseforms of silence, breath, and other non-speech sounds to punctuation marks, and using a properly processed N-gram language model, unpronounced punctuation marks of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Speech & Language
سال: 2022
ISSN: ['1095-8363', '0885-2308']
DOI: https://doi.org/10.1016/j.csl.2022.101358