Distance Measures for Wavelet Representation of Speech Segments
نویسنده
چکیده
Dyadic scheme of wavelet signal decomposition leads to a specific division of frequency bands. It is comparable to mel-frequency division and may be used in effective parameterization of speech signal in recognition systems, speech coding or other speech signal based applications. This paper discusses efficiency of different spectral distance measures applied to wavelet-parameterized speech. The presented methods are designated to use in isolated phoneme recognition task. INTRODUCTION Human ear is sensitive to frequency attributes of audio signals, including speech. Frequency resolution of human’s perception is not the same for all frequencies. In low frequency bands, slightly different tones are much more distinguishable then they would be in higher frequency bands. In the effect, frequency characteristics of human’s ear are not linear but rather logarithmic. Experiments showed that human aural system perceive frequencies not linearly but according to a mel-frequency scale [9] ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ + ⋅ = 700 1 1127 Hz e mel f f log . (1) Many practical works confirmed usefulness of this scale in speech recognition systems, speech coding or compression. Most frequently used type of speech parameterization that support melfrequency scale is MFCC (mel-frequency cepstral coefficients) [6]. Our proposition provides logarithmic frequency scale division, typical for dyadic discrete wavelet decomposition scheme, and may be an interesting alternative approach. It is necessary to adjust known, or develop new computational tools for use with this type of parameterization in speech recognition applications. As far as wavelet parameterization scheme, has properties similar to linear prediction coefficients or mel-frequency cepstral coefficients, the known computational tools may be applied. In this paper different spectral distortion measures were applied to wavelet parameterization scheme in order to determine their efficiency. WAVELET SPEECH SIGNAL PARAMETERISATION Continuous-time wavelet transform is defined as
منابع مشابه
A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain
Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...
متن کاملSpeech Encryption Using Wavelet Packets
The aim of speech scrambling algorithms is to transform clear speech into an unintelligible signal so that it is difficult to decrypt it in the absence of the key. Most of the existing speech scrambling algorithms tend to retain considerable residual intelligibility in the scrambled speech and are easy to break. Typically, a speech scrambling algorithm involves permutation of speech segments in...
متن کاملIntelligent Voice Recognition System Based on Acoustic and Speaking Fundamental Frequency Characteristics
Speech recognition is a fascinating application of Digital Signal Processing and has many real-world applications. In this paper, a speech recognition system is developed for isolated spoken words using Discrete Wavelet Transforms (DWT) and Artificial Neural Networks (ANN). Speech signals are one-dimensional and are random in nature. This paper investigates Automatic Speech Recognition of gende...
متن کاملOn the two-wavelet localization operators on homogeneous spaces with relatively invariant measures
In the present paper, we introduce the two-wavelet localization operator for the square integrable representation of a homogeneous space with respect to a relatively invariant measure. We show that it is a bounded linear operator. We investigate some properties of the two-wavelet localization operator and show that it is a compact operator and is contained in a...
متن کاملAdult Voice Recognition System using Text Variable Phoneme Model and Coarse Speaking Fundamental Frequency Characteristics
-------------------------------------------------------Abstract--------------------------------------------------------Speech recognition is a fascinating application of Digital Signal Processing and has many real-world applications. In this paper, a speech recognition system is developed for isolated spoken words using Discrete Wavelet Transforms (DWT) and Artificial Neural Networks (ANN). Spe...
متن کامل