Transmissions and transitions: a study of two common assumptions in multi-band ASR
نویسندگان
چکیده
Is multi-band ASR inherently inferior to a full-band approach because phonetic information is lost due to the division of the frequency space into sub-bands? Do the phonetic transitions in sub-bands occur at different times? The first statement is a common objection of the critics of multi-band ASR, and the second, a common assumption by multi-band researchers. This paper is dedicated to finding answers to both these questions. To study the first point, we calculate phonetic feature transmission for sub-bands. Not only do we fail to substantiate the above objection, but we observe the contrary. We confirm the second hypothesis by analyzing the phonetic transition lags in each sub-band. These results reinforce our view that multi-band speech analysis provides useful information for ASR, particularly when band merging takes place at the end state for a phonetic or syllabic model, allowing sub-bands to be independently time-aligned within the model.
منابع مشابه
Asynchrony with trained transition probabilities improves performance in multi-band speech recognition
One of the central themes in multi-band automatic speech recognition (ASR) is to devise a strategy for recombining sub-band information. This in turn raises two questions: (1) at what phonetic unit should the recombination take place? (2) How asynchronously should the sub-bands be run? Theoretically asynchronous multi-band ASR should perform at least as well as synchronous multi-band ASR. Howev...
متن کاملMulti-stream adaptive evidence combination for noise robust ASR
In this paper, we develop dierent mathematical models in the framework of the multi-stream paradigm for noise robust automatic speech recognition (ASR), and discuss their close relationship with human speech perception. Largely inspired by Fletcher's ``product-of-errors'' rule (PoE rule) in psychoacoustics, multi-band ASR aims for robustness to data mismatch through the exploitation of spectra...
متن کاملSome Applications of a Priori Knowledge in Multi-stream Hmm and Hmm/ann Based Asr
Multi-band ASR was largely inspired by the extremely high level of redundancy in the spectral signal representation which can be inferred from Fletcher’s product-oferrors rule for human speech perception. Indeed, the main aim of the multi-band approach is to exploit this redundancy in order to overcome the problem of data mismatch (while making no assumptions about noise type) by focusing recog...
متن کامل3D Classification of Urban Features Based on Integration of Structural and Spectral Information from UAV Imagery
Three-dimensional classification of urban features is one of the important tools for urban management and the basis of many analyzes in photogrammetry and remote sensing. Therefore, it is applied in many applications such as planning, urban management and disaster management. In this study, dense point clouds extracted from dense image matching is applied for classification in urban areas. Appl...
متن کاملCombining connectionist multi-band and full-band probability streams for speech recognition of natural numbers
Multi-band automatic speech recognition is a new and exploratory area of speech recognition which has been getting much attention in the research community. It has been shown that multiband ASR reduces word error in noisy conditions, particularly in the case of narrow band noise. In this work we show that multi-band ASR could be used to improve the speech recognition accuracy of natural numbers...
متن کامل