Incorporating linguistic knowledge into automatic dialect identification of Spanish
نویسندگان
چکیده
Automatic dialect identification, like automatic language identification , has often been approached through the use of phonetic frequencies and phonetic sequence modeling. While such statistical systems perform well on language identification problems, they are less adept at the more difficult problem of automatic dialect identification, particularly on short segments of speech. In this paper we explore issues involved in exploiting linguistic knowledge to aid in the automatic identification of dialects of conversational Spanish.
منابع مشابه
Automatic Dialect Identification: A Study of British English
This contribution deals with the automatic identification of the dialects of the British Isles. Several methods based on the linguistic study of dialect-specific vowel systems are proposed and compared using the Accents of the British Isles (ABI) corpus. The first method examines differences in diphthongization for the face lexical set. Discrimination scores in a two-dialect discrimination task...
متن کاملFrom perceptual designs to linguistic typology and automatic language identification : overview and perspectives
This paper deals with the overview of the methods in perceptual language identification and the suggestion of a new approach based on a two-step methodology integrating to perception “genetic” considerations and resulting into the modeling of perceptually identified discriminative cues. The first study reported here concerns experimental designs for perceptual and automatic identification of th...
متن کاملGaussian Mixture Selection and Data Selection for Unsupervised Spanish Dialect Classification
Automatic dialect classification has gained interests in the field of speech research because it is important to characterize speaker traits and to estimate knowledge that could improve integrated speech technology (e.g., speech recognition, speaker recognition). This study addresses novel advances in unsupervised spontaneous Latin American Spanish dialect classification. The problem considers ...
متن کاملMultilingual Code-switching Identification via LSTM Recurrent Neural Networks
This paper describes the HHU-UH-G system submitted to the EMNLP 2016 Second Workshop on Computational Approaches to Code Switching. Our system ranked first place for Arabic (MSA-Egyptian) with an F1-score of 0.83 and second place for Spanish-English with an F1-score of 0.90. The HHU-UHG system introduces a novel unified neural network architecture for language identification in code-switched tw...
متن کاملSyllable-final /s/ lenition in the LDC's callhome Spanish corpus
This paper describes a data corpus which is being made available through the Linguistic Data Consortium (LDC) that codes lenition of syllable-final /s/ in Latin American Spanish in the LDC’s CallHome Spanish corpus. This lenition is a process whereby the /s/ may be aspirated (pronounced [h]) or deleted altogether. Since syllable-final /s/ is frequent in Spanish, lenition has a great effect on o...
متن کامل