A Neurally Motivated Technique for Voicing Detection and F 0 Estimation for Speech
نویسنده
چکیده
Speech consists of alternating voiced and unvoiced sections. Voiced speech consists of multiple harmonics of some fundamental (F0); unvoiced speech consists of silence, or ltered noise. Here, speech is wideband bandpass ltered into many bands (modelling the cochlea). Each lter output is rectiied (modelling the organ of Corti hair cell action), and bandpass ltered by convolution with the diierence between two causal Gaussian averaging functions. This detects and emphasises the amplitude modulation resulting from unresolved harmonics (and models the combined eeect of the auditory nerve and certain cochlear nucleus cell types). This output is compressed, summed across the bands, then used to discover glottal pulses. The presence of glottal pulses signals voicing, and the time between glottal pulses is used to nd F0. Results show good performance, particularly on male speakers. The system is reasonably resistant to background noise.
منابع مشابه
Improvement to a NAM captured whisper-to-speech system
Exploiting a tissue-conductive sensor – a stethoscopic microphone – the system developed at NAIST which converts Non-Audible Murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted speech is however still insufficient for computer-mediated communication, notably because of the poor estimation of F0 from unvoiced speech and beca...
متن کاملMulti-band summary correlogram-based pitch detection for noisy speech
A multi-band summary correlogram (MBSC)-based pitch detection algorithm (PDA) is proposed. The PDA performs pitch estimation and voiced/unvoiced (V/UV) detection via novel signal processing schemes that are designed to enhance the MBSC’s peaks at the most likely pitch period. These peak-enhancement schemes include comb-filter channel-weighting to yield each individual subband’s summary correlog...
متن کاملJoint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics
This paper focuses on the problem of pitch tracking in noisy conditions. A method using harmonic information in the residual signal is presented. The proposed criterion is used both for pitch estimation, as well as for determining the voicing segments of speech. In the experiments, the method is compared to six state-of-the-art pitch trackers on the Keele and CSTR databases. The proposed techni...
متن کاملA Novel Voicing Cut - off Det for Low Bit - Rate Harmonic
Generally, phonetic classification for low rate speech coding is restricted to either a simple binary voiced/unvoiced classification of entire speech frames, or alternatively, a more complicated estimation of the voicing for a set of frequency bands. A good compromise between these two techniques is estimation of a single cut-off frequency that separates the spectrum into voiced (below) and unv...
متن کاملNebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors
A F0 and voicing status estimation algorithm for speech analysis/synthesis is proposed. Instead of directly modeling speech signals, the proposed algorithm models the behavior of feature extractors under additive noise using a bank of Gaussian mixture models, trained on artificial data generated from Monte-Carlo simulations. The conditional distributions of F0 predicted by the GMMs are combined...
متن کامل