Log-normal matrix factorization with application to speech-music separation

نویسندگان

  • Takuya Yoshioka
  • Daichi Sakaue
چکیده

This paper proposes a novel spectrogram factorization method, called log-normal matrix factorization (LogNMF). Conventional nonnegative matrix factorization (NMF) methods cannot efficiently capture random properties of actual spectra because these methods assume that speech and noise spectrograms can be precisely represented by combining a small number of temporally invariant spectral patterns, called basis vectors. This limitation results in unsatisfactory performance when NMF is used for speech enhancement. The proposed method overcomes this limitation by allowing each basis vector to change randomly at each time frame with a log-normal distribution. The use of the log-normal distribution is also desirable in that the degree of divergence between an observed spectrogram and a spectrogram model is measured based on squared errors of log power spectra, which are subjectively meaningful. Experimental results show that LogNMF is able to separate speech signals from background music signals more precisely than NMF.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition

In this study, we propose a semi-supervised speech-music separation method which uses the speech, music and speech-music segments in a given segmented audio signal to separate speech and music signals from each other in the mixed speech-music segments. In this strategy, we assume, the background music of the mixed signal is partially composed of the repetition of the music segment in the audio....

متن کامل

Catalog-based single-channel speech-music separation

We propose a new catalog-based speech-music separation method for background music removal. Assuming that we know a catalog of the background music, we develop a generative model for the superposed speech and music spectrograms. We represent the speech spectrogram by a Non-negative Matrix Factorization (NMF) model and the music spectrogram by a conditional Poisson Mixture Model (PMM). By choosi...

متن کامل

Block Nonnegative Matrix Factorization for Single Channel Source Separation

Nonnegative Matrix Factorization (NMF) [1, 2] has been widely used in audio research, e.g. automatic music transcription [3], musical source separation [4], and speech enhancement [5]. The key strategy for applying NMF to audio-related tasks is to find a lower rank representation of the Short Time Fourier Transformed (STFT) input signal and use the basis vectors as dictionaries. For example, in...

متن کامل

Bayesian factorization and selection for speech and music separation

This paper proposes a new Bayesian nonnegative matrix factorization (NMF) for speech and music separation. We introduce the Poisson likelihood for NMF approximation and the exponential prior distributions for the factorized basis matrix and weight matrix. A variational Bayesian (VB) EM algorithm is developed to implement an efficient solution to variational parameters and model parameters for B...

متن کامل

Adaptive Group Sparsity for Non-Negative Matrix Factorization with Application to Unsupervised Source Separation

Non-negative matrix factorization (NMF) is an appealing technique for many audio applications, such as automatic music transcription, source separation and speech enhancement. Sparsity constraints are commonly used on the NMF model to discover a small number of dominant patterns. Recently, group sparsity has been proposed for NMF based methods, in which basis vectors belonging to a same group a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012