Evaluation and Analysis of Hybrid Intelligent Pattern Recognition Techniques for Speaker Identification
نویسنده
چکیده
The rapid momentum of the technology progress in the recent years has led to a tremendous rise in the use of biometric authentication systems. The objective of this research is to investigate the problem of identifying a speaker from its voice regardless of the content (i.e. text-independent), and to design efficient methods of combining face and voice in producing a robust authentication system. A novel approach towards speaker identification is developed using wavelet analysis, and multiple neural networks including Probabilistic Neural Network (PNN), General Regressive Neural Network (GRNN) and Radial Basis Function-Neural Network (RBF NN) with the AND voting scheme. This approach is tested on GRID and VidTIMIT corpora and comprehensive test results have been validated with stateof-the-art approaches. The system was found to be competitive and it improved the recognition rate by 15% as compared to the classical Mel-frequency Cepstral Coefficients (MFCC), and reduced the recognition time by 40% compared to Back Propagation Neural Network (BPNN), Gaussian Mixture Models (GMM) and Principal Component Analysis (PCA). Another novel approach using vowel formant analysis is implemented using Linear Discriminant Analysis (LDA). Vowel formant based speaker identification is best suitable for real-time implementation and requires only a few bytes of information to be stored for each speaker, making it both storage and time efficient. Tested on GRID and VidTIMIT, the proposed scheme was found to be 85.05% accurate when Linear Predictive Coding (LPC) is used to extract the vowel formants, which is much higher than the accuracy of BPNN and GMM. Since the proposed scheme does not require any training time other than creating a small database of vowel formants, it is faster as well. Furthermore, an increasing number of speakers makes it difficult for BPNN and GMM to sustain their accuracy, but the proposed score-based methodology stays almost linear. Finally, a novel audio-visual fusion based identification system is implemented using GMM and MFCC for speaker identification and PCA for face recognition. The results of speaker identification and face recognition are fused at different levels, namely the feature, score and decision levels. Both the score-level and decision-level (with OR voting) fusions were shown to outperform the feature-level fusion in terms of accuracy and error resilience. The result is in line with the distinct nature of the two modalities which lose themselves when combined at the feature-level. The GRID and VidTIMIT test results validate that the proposed scheme is one of the best candidates for the fusion of face and voice due to its low computational time and high recognition accuracy.
منابع مشابه
An Intelligent Control Strategy in a Parallel Hybrid Vehicle
This paper presents a design procedure for an adaptive power management control strategy based on a driving cycle recognition algorithm. The design goal of the control strategy is to minimize fuel consumption and engine-out NOx, HC and CO emissions on a set of diversified driving schedules. Seven facility-specific drive cycles are considered to represent different driving scenarios. For each fa...
متن کاملComprehensive Analysis of Signal Processing Techniques Used For Speaker Identification
Speaker recognition is more active biometric task, which bases from the more general speech processing area. Likewise, most of the other speech-related recognition activities (language recognition, speech recognition), speaker recognition is a multidisciplinary problem. Both understanding of pattern recognition techniques and domain knowledge (Acoustics/Phonetics) are necessary. The motivation ...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملAdvances on HMM-based text-dependent speaker verification
This paper presents recent development on text-dependent speaker verification technology in EU project PICASSO, which have improved the SV performance significantly. In the project we adopt HMM approach for pattern matching. In the paper we describes four different techniques, adaptive variance flooring, multiple use of enrolment sample, generalised competitive measurement for score normalisati...
متن کامل