Efficient scalable encoding for distributed speech recognition
نویسندگان
چکیده
In this paper the remote speech recognition problem is addressed. Speech features are extracted at a client and transmitted to a remote recognizer. This enables a low complexity client, which does not have the computational and memory resources to host a complex speech recognizer, to make use of distributed resources to provide speech recognition services to the user. The novelties of the proposed work are (i) the extracted features are compressed using scalable encoding techniques providing a multi-resolution bitstream, (ii) a complete scalable distributed speech recognition (DSR) system is presented wherein the proposed scalable encoding technique is combined with a scalable recognition system. The scalable DSR system provides successive approximation in terms of recognition performance, (i.e., as additional bits are transmitted the recognition can be refined to improve the performance) and achieves both bandwidth and complexity (latency) reductions. The proposed encoding schemes are well suited to be implemented on light-weight mobile devices where varying ambient conditions and limited computational capabilities pose a severe constraint in achieving good recognition performance. The scalable DSR system is capable of adapting to the varying network, system and user constraints by operating at the “right” trade-off point between transmission rate, recognition performance and complexity to provide good quality of service (QoS) to the user. The system was tested using two case studies. In the first, the scalable encoder along with a dynamic time warping-hidden Markov model (DTW-HMM) system reduced the recognition complexity by 25% compared to a system using only a HMM, with no degradation in word error rate (WER). In the second study, a distributed two-stage names recognition task at a bitrate of 4 kb/s incurred a 0.3% relative increase in WER, compared to using uncompressed features. Reducing the bitrate to 2.5 kb/s resulted in a 4.2% relative increase in WER. In contrast, using GSM compressed speech (bitrate 13 kb/s) in the two-stage names recognition task resulted in a 12.6% relative increase in WER.
منابع مشابه
Efficient scalable encoding for distributed speech recognition q
The problem of encoding speech features in the context of a distributed speech recognition system is addressed. Specifically, speech features are compressed using scalable encoding techniques to provide a multi-resolution bitstream. The use of this scalable encoding procedure is investigated in conjunction with a multi-pass distributed speech recognition (DSR) system. The multi-pass DSR system ...
متن کاملEfficient scalable speech compression for scalable speech recognition
We propose a scalable recognition system for reducing recognition complexity. Scalable recognition can be combined with scalable compression in a distributed speech recognition (DSR) application to reduce both the computational load and the bandwidth requirement at the server. A low complexity preprocessor is used to eliminate the unlikely classes so that the complex recognizer can use the redu...
متن کاملAn efficient and scalable 2D DCT-based feature coding scheme for remote speech recognition
A 2D DCT-based approach to compressing acoustic features for remote speech recognition applications is presented. The coding scheme involves computing a 2D DCT on blocks of feature vectors followed by uniform scalar quantization, run-length and Huffman coding. Digit recognition experiments were conducted in which training was done with unquantized cepstral features from clean speech and testing...
متن کاملTowards Efficient and Scalable Speech Compression Schemes for Robust Speech Recognition Applications
متن کامل
A posteriori SNR weighted energy based variable frame rate analysis for speech recognition
This paper presents a variable frame rate (VFR) analysis method that uses an a posteriori signal-to-noise ratio (SNR) weighted energy distance for frame selection. The novelty of the method consists in the use of energy distance (instead of cepstral distance) to make it computationally efficient and the use of SNR weighting to emphasize the reliable regions in speech signals. The VFR method is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 48 شماره
صفحات -
تاریخ انتشار 2006