Speaker Identification From Youtube Obtained Data
نویسنده
چکیده
An efficient, and intuitive algorithm is presented for the identification of speakers from a long dataset (like YouTube long discussion, Cocktail party recorded audio or video).The goal of automatic speaker identification is to identify the number of different speakers and prepare a model for that speaker by extraction, characterization and speaker-specific information contained in the speech signal. It has many diverse application specially in the field of Surveillance , Immigrations at Airport , cyber security , transcription in multi-source of similar sound source, where it is difficult to assign transcription arbitrary. The most commonly speech parameterization used in speaker verification, K-mean, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique is then explained. Gaussian mixture models (GMM), perhaps the most robust machine learning algorithm has been introduced to examine and judge carefully speaker identification in text independent. The application or employment of Gaussian mixture models for monitoring & Analysing speaker identity is encouraged by the familiarity, awareness, or understanding gained through experience that Gaussian spectrum depict the characteristics of speaker's spectral conformational pattern and remarkable ability of GMM to construct capricious densities after that we illustrate 'Expectation maximization' an iterative algorithm which takes some arbitrary value in initial estimation and carry on the iterative process until the convergence of value is observed We have tried to obtained 85 ~ 95% of accuracy using speaker modeling of vector quantization and Gaussian Mixture model ,so by doing various number of experiments we are able to obtain 79 ~ 82% of identification rate using Vector quantization and 85 ~ 92.6% of identification rate using GMM modeling by Expectation maximization parameter estimation depending on variation of parameter.
منابع مشابه
VoxCeleb: A Large-Scale Speaker Identification Dataset
Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and are usually hand-annotated, hence limited in size. The goal of this paper is to generate a large scale text-independent speaker identification dataset collected ‘in the wild’. We make two contributions. First, we propose a fully automated pipeline based on computer vision technique...
متن کاملSpeaker Identification with VoxCeleb DataSet
In this project, we perform a text independent speaker identification experiment with a newly released data set, VoxCeleb (2017)[1], which consists of celebrity interview audio clips downloaded from Youtube. It’s a challenging data set in the sense that there are often multiple vocal sources in the same clip. A MFCC feature vector based Deep Neural Network (DNN) is used as our baseline. It is c...
متن کاملSemi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?
Speaker identification is based on classification methods and acoustic models. Acoustic models are learned from audio data related to the speakers to be modeled. However, recording and annotating such data is time-consuming and laborintensive. In this paper we propose to use data available on video-sharing websites like YouTube and Dailymotion to learn speaker-specific acoustic models. This pro...
متن کاملOn the amount of speech data necessary for successful speaker identification
The paper deals with the dependence between the speaker identification performance and the amount of test data. Three speaker identification procedures based on hidden Markov models (HMMs) of phonemes are presented here. One, which is quite commonly used in the speaker recognition systems based on HMMs, uses the likelihood of the whole utterance for speaker identification. The other two that ar...
متن کاملIncremental Speaker Adaptation with Minimum Error Discriminative Training for Speaker Identification
Minimum Classification Error (MCE) has shown to be effective in improving the performance of a speaker identification system [1]. However, there are still problems to solve, such as the variability of the voice characteristics of a particular speaker through time. In this work, we analyze the degradation of a GMM-based textindependent speaker identification system when using test data recorded ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1411.2795 شماره
صفحات -
تاریخ انتشار 2014