Incorporating linguistic knowledge into automatic dialect identification of Spanish

نویسندگان

  • Lisa Yanguas
  • Gerald C. O'Leary
  • Marc A. Zissman
چکیده

Automatic dialect identification, like automatic language identification , has often been approached through the use of phonetic frequencies and phonetic sequence modeling. While such statistical systems perform well on language identification problems, they are less adept at the more difficult problem of automatic dialect identification, particularly on short segments of speech. In this paper we explore issues involved in exploiting linguistic knowledge to aid in the automatic identification of dialects of conversational Spanish.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Dialect Identification: A Study of British English

This contribution deals with the automatic identification of the dialects of the British Isles. Several methods based on the linguistic study of dialect-specific vowel systems are proposed and compared using the Accents of the British Isles (ABI) corpus. The first method examines differences in diphthongization for the face lexical set. Discrimination scores in a two-dialect discrimination task...

متن کامل

From perceptual designs to linguistic typology and automatic language identification : overview and perspectives

This paper deals with the overview of the methods in perceptual language identification and the suggestion of a new approach based on a two-step methodology integrating to perception “genetic” considerations and resulting into the modeling of perceptually identified discriminative cues. The first study reported here concerns experimental designs for perceptual and automatic identification of th...

متن کامل

Gaussian Mixture Selection and Data Selection for Unsupervised Spanish Dialect Classification

Automatic dialect classification has gained interests in the field of speech research because it is important to characterize speaker traits and to estimate knowledge that could improve integrated speech technology (e.g., speech recognition, speaker recognition). This study addresses novel advances in unsupervised spontaneous Latin American Spanish dialect classification. The problem considers ...

متن کامل

Multilingual Code-switching Identification via LSTM Recurrent Neural Networks

This paper describes the HHU-UH-G system submitted to the EMNLP 2016 Second Workshop on Computational Approaches to Code Switching. Our system ranked first place for Arabic (MSA-Egyptian) with an F1-score of 0.83 and second place for Spanish-English with an F1-score of 0.90. The HHU-UHG system introduces a novel unified neural network architecture for language identification in code-switched tw...

متن کامل

Syllable-final /s/ lenition in the LDC's callhome Spanish corpus

This paper describes a data corpus which is being made available through the Linguistic Data Consortium (LDC) that codes lenition of syllable-final /s/ in Latin American Spanish in the LDC’s CallHome Spanish corpus. This lenition is a process whereby the /s/ may be aspirated (pronounced [h]) or deleted altogether. Since syllable-final /s/ is frequent in Spanish, lenition has a great effect on o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998