Speaker Identification From Youtube Obtained Data

نویسنده

Nitesh Kumar Chaudhary

چکیده

An efficient, and intuitive algorithm is presented for the identification of speakers from a long dataset (like YouTube long discussion, Cocktail party recorded audio or video).The goal of automatic speaker identification is to identify the number of different speakers and prepare a model for that speaker by extraction, characterization and speaker-specific information contained in the speech signal. It has many diverse application specially in the field of Surveillance , Immigrations at Airport , cyber security , transcription in multi-source of similar sound source, where it is difficult to assign transcription arbitrary. The most commonly speech parameterization used in speaker verification, K-mean, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique is then explained. Gaussian mixture models (GMM), perhaps the most robust machine learning algorithm has been introduced to examine and judge carefully speaker identification in text independent. The application or employment of Gaussian mixture models for monitoring & Analysing speaker identity is encouraged by the familiarity, awareness, or understanding gained through experience that Gaussian spectrum depict the characteristics of speaker's spectral conformational pattern and remarkable ability of GMM to construct capricious densities after that we illustrate 'Expectation maximization' an iterative algorithm which takes some arbitrary value in initial estimation and carry on the iterative process until the convergence of value is observed We have tried to obtained 85 ~ 95% of accuracy using speaker modeling of vector quantization and Gaussian Mixture model ,so by doing various number of experiments we are able to obtain 79 ~ 82% of identification rate using Vector quantization and 85 ~ 92.6% of identification rate using GMM modeling by Expectation maximization parameter estimation depending on variation of parameter.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

VoxCeleb: A Large-Scale Speaker Identification Dataset

Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and are usually hand-annotated, hence limited in size. The goal of this paper is to generate a large scale text-independent speaker identification dataset collected ‘in the wild’. We make two contributions. First, we propose a fully automated pipeline based on computer vision technique...

متن کامل

Speaker Identification with VoxCeleb DataSet

In this project, we perform a text independent speaker identification experiment with a newly released data set, VoxCeleb (2017)[1], which consists of celebrity interview audio clips downloaded from Youtube. It’s a challenging data set in the sense that there are often multiple vocal sources in the same clip. A MFCC feature vector based Deep Neural Network (DNN) is used as our baseline. It is c...

متن کامل

Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?

Speaker identification is based on classification methods and acoustic models. Acoustic models are learned from audio data related to the speakers to be modeled. However, recording and annotating such data is time-consuming and laborintensive. In this paper we propose to use data available on video-sharing websites like YouTube and Dailymotion to learn speaker-specific acoustic models. This pro...

متن کامل

On the amount of speech data necessary for successful speaker identification

The paper deals with the dependence between the speaker identification performance and the amount of test data. Three speaker identification procedures based on hidden Markov models (HMMs) of phonemes are presented here. One, which is quite commonly used in the speaker recognition systems based on HMMs, uses the likelihood of the whole utterance for speaker identification. The other two that ar...

متن کامل

Incremental Speaker Adaptation with Minimum Error Discriminative Training for Speaker Identification

Minimum Classification Error (MCE) has shown to be effective in improving the performance of a speaker identification system [1]. However, there are still problems to solve, such as the variability of the voice characteristics of a particular speaker through time. In this work, we analyze the degradation of a GMM-based textindependent speaker identification system when using test data recorded ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1411.2795 شماره

صفحات -

تاریخ انتشار 2014

Speaker Identification From Youtube Obtained Data

نویسنده

چکیده

منابع مشابه

VoxCeleb: A Large-Scale Speaker Identification Dataset

Speaker Identification with VoxCeleb DataSet

Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?

On the amount of speech data necessary for successful speaker identification

Incremental Speaker Adaptation with Minimum Error Discriminative Training for Speaker Identification

عنوان ژورنال:

اشتراک گذاری