Language Identification for Under-Resourced Languages in the Basque Context
نویسندگان
چکیده
Automatic Speech Recognition (ASR) is a broad research area that absorbs many efforts from the research community. The interest on Multilingual Systems arouses in the Basque Country because there are three official languages (Basque, Spanish, and French), and there is much linguistic interaction among them, even if Basque has very different roots than the other two languages. The development of Multilingual Large Vocabulary Continuous Speech Recognition systems involves issues as: Language Identification, Acoustic Phonetic Decoding, Language Modeling or the development of appropriate Language Resources. This paper describes the development of a Language Identification (LID) system oriented to robust Multilingual Speech Recognition in the Basque context. The work presents hybrid strategies for LID, based on the selection of system elements by several classifiers and Discriminant Analysis improved with robust regularized covariance matrix estimation methods oriented to under-resourced languages and stochastic methods for speech recognition tasks (Hidden Markov Models and ngrams)
منابع مشابه
Semantic speech recognition in the Basque context Part I: cross-lingual approaches
This work, divided into Part I and II, describes the development of GorUP a Semantic Speech Recognition System in the Basque context. Part I analyses crosslingual approaches oriented to under-resourced languages and Part II the development of the Language Identification system. During the development, data optimization methods and Soft Computing methodologies oriented to complex environment are...
متن کاملAcoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context
The development of Large Vocabulary Continuous Speech Recognition systems involves issues as: Acoustic Phonetic Decoding, Language Modelling or the development of appropriated Language Resources. In the state of the art, new techniques for reusing Language Resources of more resourced related languages is becoming of great interest, and there is also a growing interest on Multilingual systems. T...
متن کاملCross-Lingual Approaches: The Basque Case
Cross-lingual speech recognition could be relevant for Multilingual Automatic Speech Recognition (ASR) systems which work with under-resourced languages and appropriately equipped languages. In the Basque Country, the interest on Multilingual Automatic Speech Recognition systems comes from the fact that there are three official languages in use (Basque, Spanish, and French). . Multilingual Basq...
متن کاملQuizzes on Tap: Exporting a Test Generation System from One Less-Resourced Language to Another
It is difficult to develop and deploy Language Technology and applications for minority languages for many reasons. These include the lack of Natural Language Processing (NLP) resources for the language, a scarcity of NLP researchers who speak the language and the communication gap between teachers in the classroom and researchers working in universities and other centres of research. One appro...
متن کاملMultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification
While sentiment analysis has become an established field in the NLP community, research into languages other than English has been hindered by the lack of resources. Although much research in multi-lingual and cross-lingual sentiment analysis has focused on unsupervised or semi-supervised approaches, these still require a large number of resources and do not reach the performance of supervised ...
متن کامل