Phonetic, idiolectal and acoustic speaker recognition
نویسندگان
چکیده
This paper describes a text-independent speaker recognition system that achieves an equal error rate of less than 1% by combining phonetic, idiolect, and acoustic features. The phonetic system is a novel language-independent speakerrecognition system based on differences among speakers in dynamic realization of phonetic features (i.e., pronunciation), rather than spectral differences in voice quality. The system exploits phonetic information from six languages to perform text-independent speaker recognition. The idiolectal system models speaker idiosyncrasies with word n-gram frequency counts computed from the output of an automatic speech recognition system. The acoustic system is a Gaussian Mixture Model-Universal Background Model that exploits the spectral differences in voice quality. All experiments were performed on the NIST 2001 Speaker Recognition Evaluation Extended Data Task.
منابع مشابه
Support vector machine fusion of idiolectal and acoustic speaker information in Spanish conversational speech
This paper proposes a Support Vector Machine (SVM) based combining scheme that incorporates ideolectal and acoustic characteristics for speaker recognition. Two statistical model paradigms, namely GMM for acoustic modeling and Bigrams for language modeling, provide multilevel speaker information that affords a better classification performance when SVM-based fusion is accomplished. This combini...
متن کاملGenerative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine
In this paper, we argue the way of modeling speech signals based on three-way restricted Boltzmann machine (3WRBM) for separating phonetic-related information and speaker-related information from an observed signal automatically. The proposed model is an energy-based probabilistic model that includes three-way potentials of three variables: acoustic features, latent phonetic features, and speak...
متن کاملOn the Relationship between Phone Phonetic Speaker Recog
Speaker recognition techniques have traditionally relied on purely acoustic features and models. During the last few years, however, the field of speaker recognition has started to show interest in the use of higher level features. In particular, phonetic decodings modeled with statistical language models (n-grams) have already shown its effectiveness in several research works. However, the rel...
متن کاملSpeaker recognition based on idiolectal differences between speakers
“Familiar” speaker information is explored using non-acoustic features in NIST’s new “extended data” speaker detection task.[1] Word unigrams and bigrams, used in a traditional target/background likelihood ratio framework, are shown to give surprisingly good performance. Performance continues to improve with additional training and/or test data. Bigram performance is also found to be a function...
متن کاملText-based Speaker Identification on Multiparty Dialogues Using Multi-document Convolutional Neural Networks
We propose a convolutional neural network model for text-based speaker identification on multiparty dialogues extracted from the TV show, Friends. While most previous works on this task rely heavily on acoustic features, our approach attempts to identify speakers in dialogues using their speech patterns as captured by transcriptions to the TV show. It has been shown that different individual sp...
متن کامل