AUTOMATIC SPEAKER IDENTIFICATION USING REUSABLE AND RETRAINABLE BINARY–PAIR PARTITIONED NEURAL NETWORKS by
نویسندگان
چکیده
AUTOMATIC SPEAKER IDENTIFICATION USING REUSABLE AND RETRAINABLE BINARY-PAIR PARTITIONED NEURAL NETWORKS Ashutosh Mishra Old Dominion University May 2003 Director: Dr. Stephen A. Zahorian This thesis presents an extension of the work previously done on speaker identification using Binary Pair Partitioned (BPP) neural networks. In the previous work, a separate network was used for each pair of speakers in the speaker population. Although the basic BPP approach did perform well and had a simple underlying algorithm, it had the obvious disadvantage of requiring an extremely large number of networks for speaker identification with large speaker populations. It also requires training of networks proportional to the square of the number of speakers under consideration, leading to a very large number of networks to be trained and correspondingly large training and evaluation times. In the present work, the concepts of clustered speakers and reusable binary networks are investigated. Systematic methods are explored for using a network originally trained to separate only two specific speakers to also separate other speakers of other speaker pairs. For example, it would seem quite likely that a network trained to separate a particular female speaker from a particular male speaker would also reliably separate many other male speakers from many other female speakers. The focal point of the research is to develop a method for reducing the training time and the number of networks required to achieve a desired performance level. A new method of reducing the network requirement is developed along with another method to improve the accuracy to compensate for the expected loss resulting from the network reduction (compared to the BPP approach). The two methods investigated are-reusable binary-paired partitioned neural networks (RBPP) and retrained and reusable binary-pair partitioned neural networks (RRBPP). Both the methods explored and described in this thesis work very well for clean (studio quality) speech but do not provide the desired level of performance with bandwidth – limited speech (telephone quality). In this thesis, a detailed description of both the methods and the experimental results is provided. All experimental results reported are based on either the Texas Instruments Massachusetts Institute of Technology (TIMIT) or Nynex TIMIT (NTIMIT) databases, using 8 sentences (approximately 24 seconds) for training and up to two sentences (approximately 6 seconds for testing). Best results obtained with TIMIT, using 102 speakers, for BPP, RBPP, and RRBPP respectively (for 2 sentences i.e. ~ 6 seconds of test data) are 99.02 %, 99.02 %, 99.02 % of speakers correctly identified. Corresponding recognition rates for NTIMIT, again using 102 speakers, are 84.3%, 75.5% and 77.5%. Using all 630 speakers, the accuracy rates for TIMIT are 99%, 97% and 96%, and the accuracy rates for NTIMIT are ~72 %, 48% and 41 %.
منابع مشابه
Gains from diversification on convex combinations: A majorization and stochastic dominance approach
By incorporating both majorization theory and stochastic dominance theory, this paper presents a general theory and a unifying framework for determining the diversification preferences of risk-averse investors and conditions under which they would unanimously judge a particular asset to be superior. In particular, we develop a theory for comparing the preferences of different convex combination...
متن کاملImproved immunogenicity of tetanus toxoid by Brucella abortus S19 LPS adjuvant.
BACKGROUND Adjuvants are used to increase the immunogenicity of new generation vaccines, especially those based on recombinant proteins. Despite immunostimulatory properties, the use of bacterial lipopolysaccharide (LPS) as an adjuvant has been hampered due to its toxicity and pyrogenicity. Brucella abortus LPS is less toxic and has no pyrogenic properties compared to LPS from other gram negati...
متن کاملSteady electrodiffusion in hydrogel-colloid composites: macroscale properties from microscale electrokinetics.
A rigorous microscale electrokinetic model for hydrogel-colloid composites is adopted to compute macroscale profiles of electrolyte concentration, electrostatic potential, and hydrostatic pressure across membranes that separate electrolytes with different concentrations. The membranes are uncharged polymeric hydrogels in which charged spherical colloidal particles are immobilized and randomly d...
متن کاملPerturbative Analysis of Dynamical Localisation
In this paper we extend previous results on convergent perturbative solutions of the Schrödinger equation of a class of periodically timedependent two-level systems. The situation treated here is particularly suited for the investigation of two-level systems exhibiting the phenomenon of (approximate) dynamical localisation. We also present a convergent perturbative expansion for the secular fre...
متن کاملCollinear contextual suppression
The context of a target can modulate behavioral as well as neural responses to that target. For example, target processing can be suppressed by iso-oriented surrounds whereas it can be facilitated by collinear contextual elements. Here, we present experiments in which collinear elements exert strong suppression whereas iso-oriented contextual surrounds yield no contextual modulation--contrary t...
متن کامل