Universal Background Models for Real-time Speaker Change Detection
نویسندگان
چکیده
This paper addresses the problem of real-time speaker change detection in TV news broadcast, in which no prior knowledge on speakers is assumed. To remove the unreliable frames and background frames in the speech stream, we propose a new approach for feature categorization based on Gaussian Mixture Model Universal Background Model (GMM-UBM). The feature vectors are categorized into three sets, which include reliable speech, doubtful speech and unreliable speech. Then a novel distance measure is presented correspondingly for real-time speaker change detection. Extensive experiments demonstrate its good performance, and intrinsic difficulties on real-time speaker change detection are discussed as well in this paper.
منابع مشابه
Real-Time Unsupervised Speaker Change Detection
The information of speaker change point is very useful for speaker tracking and other applications. In this paper, we presented an effective algorithm for automatic speaker change detection based on LSP correlation analysis. Moreover, a general case is considered, in which the speaker and speaker number are both assumed unknown. The algorithm has low complexity and can be processed in real-time...
متن کاملUBM-based incremental speaker adaptation
This paper addresses a novel algorithm of incremental speaker adaptation (ISA) based on universal background model (UBM) for saving storage and real-time processing. This algorithm can be seen as an extension of traditional speaker adaptation. It consists of two steps, adaptation and combination. It not only considers the speaker’s characteristics in limited training data, but also prohibits ov...
متن کاملFast speaker change detection for broadcast news transcription and indexing
In this paper, we describe a new speaker change detection algorithm designed for fast transcription and audio indexing of spoken broadcast news. We have designed a two-stage algorithm that begins with a gender-independent phone-class recognition pass. We collapse the phoneme inventory to only 4 broad classes and include 4 different models for non-speech, resulting in a small fast decoder that r...
متن کاملUBM Based Speaker Segmentation and Clustering for 2-Speaker Detection
In this paper, a speaker segmentation method based on log-likelihood ratio score (LLRS) over universal background model (UBM) and a speaker clustering method based on difference of log-likelihood scores between two speaker models are proposed. During the segmentation process, the LLRS between two adjacent speech segments over UBM is used as a distance measure,while during the clustering process...
متن کاملSpeaker dependent emotion recognition using prosodic supervectors
This work presents a novel approach for detection of emotions embedded in the speech signal. The proposed approach works at the prosodic level, and models the statistical distribution of the prosodic features with Gaussian Mixture Models (GMM) mean-adapted from a Universal Background Model (UBM). This allows the use of GMM-mean supervectors, which are classified by a Support Vector Machine (SVM...
متن کامل