Joint Optimization of Linear Predictors in Speech Coders
نویسندگان
چکیده
Low bit rate speech coders often employ both formant and pitch predictors to remove near-sample and distant-sample redundancies in the speech signal. The coefficients of these predictors are usually determined for one prediction filter and then for the other (a sequential solution). This paper deals with formant and pitch predictors which are jointly optimized. The first configuration considered is a combination prediction error filter (in either a transversal or a lattice form) that performs the functions of both a formant and a pitch filter. Although a transversal combination filter outperforms the conventional F-P (formant followed by pitch) sequential solution, the combination filter exhibits a high incidence of nonminimum phase filters. For an F-P cascade connection, combined solutions and iterated sequential solutions are found. They yield higher prediction gains than the conventional F-P sequential solution. Furthermore, a practical implementation of the iterated sequential solution is developed such that both the formant and pitch filters are minimum phase. This implementation leads to decoded speech of higher perceptual quality than the conventional sequential solution. low bit rate predictive coding of speech, two nonrecursive prediction error filters are often used to process I" the input signal before coding. The prediction operations are motivated by the fact that the input speech exhibits a high degree of intersample correlation. These correlations occur between adjacent samples (near-sample redundancy) and for voiced speech, between samples separated by the pitch period (far-sample redundancy). Near-sample redundancies can be attributed to the filtering action of the vocal tract. The resonances of the vocal tract correspond to the formant frequencies in speech. Far-sample redundancies can be attributed to the pitch excitation of voiced speech. Two filters, the formant and pitch predictors, are used to remove the near-sample and far-sample redundancies, respectively. The resulting prediction residual signal is of smaller amplitude and can be coded more efficiently than the original speech waveform. The predictor coefficients are adapted by updating them at fixed intervals to follow the time-varying correlation of the speech signal. An example of a system which uses the two predictor arrangement is an Adaptive Predictive Coder (APC). Manuscript received September 4, 1987: revised September 8, 1988. This work was supported by the Natural Sciences and Engineering Research Council of Canada. P. Kabal is with the Department of Electrical Engineering, McGill University. Montreal, P.Q., Canada, H3A 2A7 and INRS-Telecommunications, UniversitC du Quebec, Verdun, P.Q., Canada. H3E IH6. R. P. Ramachandran is with the Department of Electrical Engineering. McGill University, Montreal, P.Q., Canada, H3A 2A7. IEEE Log Number 8926668. In conventional APC, the predictors are placed in a feedback loop around the quantizer. The quantization occurs sample-by-sample. With this configuration, it can be shown that the quantization noise is not only the difference between the residual and its quantized value but also the difference between the original speech signal and its reconstructed value. The perceptual distortion of the output speech can be reduced by adding a noise shaping filter which redistributes the quantization noise spectrum 111, [2]. The noise shaping filter increases the noise energy in the formant regions but decreases the noise power at frequencies in which the energy level is low. Its system function is often chosen to be a bandwidth expanded version of the transfer function of the formant predictor. An alternate APC configuration places the predictors in an open-loop format and includes a noise shaping filter [3] as depicted in Fig. 1. The quantization is again accomplished sample-by-sample. Code-Excited Linear Prediction (CELP) [4] combines an open-loop arrangement for the predictors with vector quantization. Vector quantization is implemented by searching a given repertoire of waveforms for a candidate waveform that best represents the residual in a weighted mean-square sense. The weighting is employed to accomplish noise shaping. The synthesis phase is similar in APC and CELP. In both cases, an excitation signal (the coded residual or the selected codeword after scaling) is passed through a pitch synthesis and a formant synthesis filter to produce the decoded speech. The synthesis operation can be viewed in the frequency domain as first inserting the periodic structure due to pitch and then inserting the spectral envelope (formant structure). A previous paper has considered the cascade connection of a formant and pitch predictor 151. In that paper, the coefficients were determined using a conventional sequential approach. For a sequential solution, the coefficients of the first predictor are determined from the input speech, and the coefficients of the second predictor are determined from the intermediate residual formed by the filtering action of the first predictor. The objective of this paper is to consider combination configurations and joint solutions for the formant and pitch filter coefficients. We present new algorithms for the joint optimization which give improved performance over standard techniques. The minimum phase property of the filters is also considered. A minimum phase prediction error filter at the analysis phase guarantees a stable synthesis filter. This is a signif0096-35 l8/89/O5OO-O642$Ol .OO O 1989 IEEE KABAL A N D RAMACHANDRAN: JOINT OPTIMIZATION OF LINEAR PREDICTORS Fig. 1 . Block diagram of an APC coder with noise feedback. (a) Analysis phase. (b) Synthesis phase. icant issue since, if the synthesis filters are unstable, the quantization noise is accentuated and causes undesirable perceptual distortion in the output speech [6]. The final configuration considered constrains the solutions to be minimum phase. In addition, that system uses a simplified method to choose an appropriate pitch lag. This practical approach retains the gains due to joint optimization and gives real improvements in speech quality in a coding environment. Fig. 2 shows a general analysis model for a linear predictor with arbitrary delays M k . Windows are applied to both the input and error signals. The aim of the analysis 2 is to minimize the squared error sum E* = E: = -, e , ( n ) . This leads to a linear system of equations [ 5 ] which can be written in matrix form (iPc = a ) as where the correlation entries are given by
منابع مشابه
Aalborg Universitet Joint Estimation of Short-Term and Long-Term Predictors in Speech Coders
In low bit-rate coders, the near-sample and far-sample redundancies of the speech signal are usually removed by a cascade of a shortterm and a long-term linear predictor. These two predictors are usually found in a sequential and therefore suboptimal approach. In this paper we propose an analysis model that jointly finds the two predictors by adding a regularization term in the minimization pro...
متن کاملJoint Estimation of Short-Term and Long-Term Predictors in Speech Coders Giacobello,
In low bit-rate coders, the near-sample and far-sample redundancies of the speech signal are usually removed by a cascade of a shortterm and a long-term linear predictor. These two predictors are usually found in a sequential and therefore suboptimal approach. In this paper we propose an analysis model that jointly finds the two predictors by adding a regularization term in the minimization pro...
متن کاملJoint optimization of short-term and long-term predictors in CELP speech coders
The objective of this work is to investigate whether joint optimization of short-term and long-term predictors manifests significant advantages over the sequential optimization in speech coding. We propose a new joint optimization method based on Wiener filtering. The proposed analysis model resolves the pitch-bias problem of classical LPC analysis by considering the contribution of the long-te...
متن کاملBackward adaptive RBF-based hybrid predictors for CELP-type coders at medium bit-rates
Nonlinear prediction is a natural way to increase the quality of speech coders. Several approaches have been recently proposed in this direction ([1,2,3,4] are some examples) and most of them use neural networks as predictors. Nevertheless, the computational cost due to the network training is very high, since it usally involves a gradient descent-based nonlinear optimization process. In this p...
متن کاملOn Improving the Performance of an ACELP Speech Coder
In this paper we evaluate the performance of a variety of techniques to improve the parameter analysis in CELP speech coders. These methods include using extended cost horizon in the fixed codebook search process, as well as joint optimization and delayed decision coding of the adaptive and fixed codebook parameters. Based on our simulations for the IS-641 speech coder, substantial improvements...
متن کاملEffect of MMSE- STSA Algorithm in CELP and MELPSpeech Coders
The role of speech coding is to reduce the bit rate by maintaining good speech quality. In order to improve the perceptual quality of degraded speech, different speech enhancement methods can be used. So, it is worthwhile to do research in joint systems (Speech Enhancement and Low bit rate speech coders). The work reported in this paper shows the improvement in the perceptual quality of speech ...
متن کامل