Dialect-based speaker classification of Chinese using acoustic features invariant with extra-linguistic factors
نویسندگان
چکیده
Chinese dialects-based speaker classification using modern speech technologies is really a challenge, not only because the situation of Chinese dialects is very complicated, but also because acoustic features of utterances convey dialectal information together with extra-linguistic information such as age, gender, speaker, etc. In this paper, we propose a new speaker classification technique using structural representation of pronunciation, which was originally proposed to remove extra-linguistic information from speech. After collecting dialectal utterances of a selected set of Chinese characters, each speaker is modeled as his/her pronunciation structure. Then, all the dialectal structures are classified based on bottom-up clustering. We also test the proposed method especially in terms of robustness to speaker variability. Here, the utterances of simulated very tall and short speakers are classified to compare the proposed method and the conventional one. All the experimental results show linguistically-reasonable classifications and the high independence of age and gender.
منابع مشابه
Structural analysis of dialects, sub-dialects and sub-sub-dialects of Chinese
In China, there are hundred kinds of dialects. By traditional dialectology, they are classified into seven big dialect regions and most of them also have many sub-dialects and sub-subdialects. As they are different in various linguistic aspects, people from different dialect regions often cannot communicate orally. But for the sub-dialects of one dialect region, although they are sometimes stil...
متن کاملPronunciation Proficiency Estimation Based on Multilayer Regression Analysis Using Speaker-independent Structural Features
Teachers can assess the pronunciations of students independently of extra-linguistic features such as age and gender observed in the students’ utterances. This capacity is, however, difficult to realize on machines because linguistic differences and extra-linguistic differences change acoustic features commonly. Therefore, the performance of automatic pronunciation assessment is inevitably affe...
متن کاملWord-level invariant representations from acoustic waveforms
Extracting discriminant, transformation-invariant features from raw audio signals remains a serious challenge for speech recognition. The issue of speaker variability is central to this problem, as changes in accent, dialect, gender, and age alter the sound waveform of speech units at multiple levels (phonemes, words, or phrases). Approaches for dealing with this variability have typically focu...
متن کاملLSTM Neural Network-Based Speaker Segmentation Using Acoustic and Language Modelling
This paper presents a new speaker change detection system based on Long Short-Term Memory (LSTM) neural networks using acoustic data and linguistic content. Language modelling is combined with two different Joint Factor Analysis (JFA) acoustic approaches: i-vectors and speaker factors. Both of them are compared with a baseline algorithm that uses cosine distance to detect speaker turn changes. ...
متن کاملPhonetic and speaker variations in automatic emotion classification
The speech signal contains information that characterises the speaker and the phonetic content, together with the emotion being expressed. This paper looks at the effect of this speakerand phoneme-specific information on speech-based automatic emotion classification. The performances of a classification system using established acoustic and prosodic features for different phonemes are compared,...
متن کامل