Language Identification for Under-Resourced Languages in the Basque Context

نویسندگان

Nora Barroso

Karmele López de Ipiña

Manuel Graña

Aitzol Ezeiza

چکیده

Automatic Speech Recognition (ASR) is a broad research area that absorbs many efforts from the research community. The interest on Multilingual Systems arouses in the Basque Country because there are three official languages (Basque, Spanish, and French), and there is much linguistic interaction among them, even if Basque has very different roots than the other two languages. The development of Multilingual Large Vocabulary Continuous Speech Recognition systems involves issues as: Language Identification, Acoustic Phonetic Decoding, Language Modeling or the development of appropriate Language Resources. This paper describes the development of a Language Identification (LID) system oriented to robust Multilingual Speech Recognition in the Basque context. The work presents hybrid strategies for LID, based on the selection of system elements by several classifiers and Discriminant Analysis improved with robust regularized covariance matrix estimation methods oriented to under-resourced languages and stochastic methods for speech recognition tasks (Hidden Markov Models and ngrams)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic speech recognition in the Basque context Part I: cross-lingual approaches

This work, divided into Part I and II, describes the development of GorUP a Semantic Speech Recognition System in the Basque context. Part I analyses crosslingual approaches oriented to under-resourced languages and Part II the development of the Language Identification system. During the development, data optimization methods and Soft Computing methodologies oriented to complex environment are...

متن کامل

Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context

The development of Large Vocabulary Continuous Speech Recognition systems involves issues as: Acoustic Phonetic Decoding, Language Modelling or the development of appropriated Language Resources. In the state of the art, new techniques for reusing Language Resources of more resourced related languages is becoming of great interest, and there is also a growing interest on Multilingual systems. T...

متن کامل

Cross-Lingual Approaches: The Basque Case

Cross-lingual speech recognition could be relevant for Multilingual Automatic Speech Recognition (ASR) systems which work with under-resourced languages and appropriately equipped languages. In the Basque Country, the interest on Multilingual Automatic Speech Recognition systems comes from the fact that there are three official languages in use (Basque, Spanish, and French). . Multilingual Basq...

متن کامل

Quizzes on Tap: Exporting a Test Generation System from One Less-Resourced Language to Another

It is difficult to develop and deploy Language Technology and applications for minority languages for many reasons. These include the lack of Natural Language Processing (NLP) resources for the language, a scarcity of NLP researchers who speak the language and the communication gap between teachers in the classroom and researchers working in universities and other centres of research. One appro...

متن کامل

MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification

While sentiment analysis has become an established field in the NLP community, research into languages other than English has been hindered by the lack of resources. Although much research in multi-lingual and cross-lingual sentiment analysis has focused on unsupervised or semi-supervised approaches, these still require a large number of resources and do not reach the performance of supervised ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Language Identification for Under-Resourced Languages in the Basque Context

نویسندگان

چکیده

منابع مشابه

Semantic speech recognition in the Basque context Part I: cross-lingual approaches

Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context

Cross-Lingual Approaches: The Basque Case

Quizzes on Tap: Exporting a Test Generation System from One Less-Resourced Language to Another

MultiBooked: A Corpus of Basque and Catalan Hotel Reviews Annotated for Aspect-level Sentiment Classification

عنوان ژورنال:

اشتراک گذاری