Speaker - Independent V O M L Recognition : Comparison of Backpropagation and Trained Classification Trees

نویسنده

  • M. Rudnick
چکیده

A series of experiments compare performance of trained classification trees t o multi-layer feedforward netRecent advances in classification with neural networks works on speaker-independent vowel recognition using inforhave been paralleled by advances in other techniques, and mation in a single spectral slice. The vowel stimuli are it is important to examine the most promising of these. In exemplars of 12 monophthongal vowels of American English this paper, we compare the performance of backpropagation taken from all phonetic contexts in spoken utterances. The (BP) networks and trained classification trees (CT) on an training set consists of 342 vowel tokens provided by 320 important real-world problem speaker-independent speakers, and the test set consists of 137 tokens provided by classification of vowel sounds in natural continuous speech. a different 100 speakers. The classification trees and neural The mathematical details of the CT technique are classifiers are trained and tested on identical data. In addipresented in a book by ~ ~ ~ i ~ ~ ~ , ~ ~ i ~ d ~ ~ , olshen, and tion, experiments are performed t o determine the most Stone [4]. Given training da ta in the form of a set of muleffective way t o present vowel information for classification. tidimensional vectors { l, the top node of the binary Classification performance is compared using (a) spectral classification tree is built by choosing a threshold for one of coefficients from the DFT; (b) Spectral coeflkients from the the variables of { x }. This threshold is chosen to maximize pitch-synchronous DFT (PS-DFT); (c) features describing the class separability of the data which passes down the the six largest peaks in the spectrum; and (d) features two descendent branches of the top node. The input space derived from Principal component analysis of the spectra, is then divided for the two descendent nodes and each of using an unsupervised neural network. these nodes continue t o split within a subset of the original The results show tha t neural nets trained with backinput space. These splits (and tree growth) continue recurpropagation produce better results than classification trees sively until all training da ta are assigned t o a leaf node in all comparable experimental conditions. Spectral with a unique and accurate cllabel. performance. Relative t o these results, using measurements the training data , a clever "pruning" criteria is then used, reducing the size of the tree and allowing for the best possiof spectral peaks produces inferior performance, while using ble performance outside the training set. The final tree, if principal component analysis t o eliminate redundancy in the spectrum slightly improves performance while greatly necessary, will fit arbitrary non-linear decision regions. reducing the size of the network. Possible reasons for the superior classification performance obtained by the backOne Of the attractive features of the "-Sive propagation are cornclassification tree technique is the ability to accurately utilpared t o other speaker-independent vowel classification stuize VeW different types of input variables. For example, if dies. the input patterns consist of both ordinal and categorical data , appropriate splits could still be made. Another possible advantage is the flexibility t o have detailed fits in part of the input space while maintaining much smoother fits in

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker independent bimodal phonetic recognition experiments

A speaker independent bimodal phonetic classification experiment regarding the Italian plosive consonants is described. The phonetic classification scheme is based on a feed forward recurrent back-propagation neural network working on audio and visual information. The speech signal is processed by an auditory model producing spectral-like parameters, while the visual signal is processed by a sp...

متن کامل

Performance Comparisons Between Backpropagation Networks and Classification Trees on Three Real-World Applications

Etienne Barnard Carnegie-Mellon University Multi-layer perceptrons and trained classification trees are two very different techniques which have recently become popular. Given enough data and time, both methods are capable of performing arbitrary non-linear classification. We first consider the important differences between multi-layer perceptrons and classification trees and conclude that ther...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004