Speaker Dependent Bengali Keyword Spotting in Unconstrained English Speech Acknowledgement
نویسنده
چکیده
A project report submitted during summer internship under the supervision of Prof. Abstract Multi‐lingual interfaces can be of great use in a number of applications. A very important issue for such systems is to first identify the segments of utterances corresponding to a specific language. Language boundary information is also very vital before any further processing can be done. Language specific keyword spotting can be used for this purpose. Thus such a word spotter can serve as an integral part of a typical multi‐lingual system. A speaker dependent 'Bengali' keyword spotter in unconstrained 'English' speech had been developed in this project. Two approaches were used. Both used whole word based HMMs for keywords. All the Bengali keywords were trained as isolated words. The first approach used whole word filler model. The second approach used trained English phoneme models with an all phone grammar network to model the filler part. For whole word based approach an optimal performance of 94.22% hit with 1.17 FA/KW/H was obtained while the maximum %hit for the same system was 97.92% but at the cost of 7.03 FA/KW/H. The second approach attained an optimal performance with hit rate of 95.83% with just 0.71 FA/KW/H. However, maximum %hit for this system was same as first approach but with lesser false alarm rate of 4.45 FA/KW/H. Performance improvements in terms of reduction of false alarms have also been proposed. Finally, further development of the existing system to a 'speaker independent Bengali keyword spotter' has been discussed.
منابع مشابه
An Application of Recurrent Neural Networks to Discriminative Keyword Spotting
Keyword spotting is a detection task consisting in discovering the presence of specific spoken words in unconstrained speech. The majority of keyword spotting systems are based on generative hidden Markov models and lack discriminative capabilities. However, discriminative keyword spotting systems are based on the estimation of a posteriori probabilities at the frame-level, hence they make use ...
متن کاملLanguage independent and unsupervised acoustic models for speech recognition and keyword spotting
Developing high-performance speech processing systems for low-resource languages is very challenging. One approach to address the lack of resources is to make use of data from multiple languages. A popular direction in recent years is to train a multi-language bottleneck DNN. Language dependent and/or multi-language (all training languages) Tandem acoustic models are then trained. This work con...
متن کاملA Vocabulary-independent Keyword Spotter for Spontaneous Chinese Speech
HarkMan keyword-spotter was designed so that it can be used in a real-world environment to automatically spot the given words of a vocabulary-independent (VIND) task in unconstrained Chinese telephone speech. In this spotter, the speaking manner and the number of keywords are not limited. This paper focuses on a novel technique that addresses acoustic modeling, keyword-spotting network, search ...
متن کاملSpotting Subsequences matching a HMM using the Average Observation Probability Criteria with application to Keyword Spotting
This paper addresses the problem of detecting keywords in unconstrained speech. The proposed algorithms search for the speech segment maximizing the average observation probability along the most likely path in the hypothesized keyword model. As known, this approach (sometimes referred to as sliding model method) requires a relaxation of the begin/endpoints of the Viterbi matching, as well as a...
متن کاملSpeaker-dependent Speech Recognition Based on Phone-like Units Models | Application to Voice Dialing
This paper presents a speaker dependent speech recognition with application to voice dialing. This work has been developed under the constraints imposed by voice dialing applications, i.e., low memory requirements and limited training material. Two methods for producing speaker dependent word baseforms based on Phone Like Units (PLU) are presented and compared : (1) a classical vector quantizer...
متن کامل