Latent Topic Model Based on Gaussian-LDA for Audio Retrieval
نویسندگان
چکیده
In this paper,we introduce a new topic model named Gaussian-LDA, which is more suitable to model continuous data. Topic Model based on latent Dirichlet allocation (LDA) is widely used for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a multinomial distribution over the vocabulary. To apply the original LDA to process continuous data, discretization based vector quantization must be done beforehand, which usually results in information loss. In the proposed model, we consider continuous emission probability, Gaussian instead of multinomial distribution. This new topic model demonstrates higher performance than standard LDA in the experiments of audio retrieval.
منابع مشابه
Latent topic model for audio retrieval
Latent topic model such as Latent Dirichlet Allocation (LDA) has been designed for text processing and has also demonstrated success in the task of audio related processing. The main idea behind LDA assumes that the words of each document arise from a mixture of topics, each of which is a multinomial distribution over the vocabulary. When applying the original LDA to process continuous data, th...
متن کاملSupervised acoustic topic model for unstructured audio information retrieval
We introduce a modified version of the acoustic topic model, which assumes an audio signal consists of latent acoustic topics and each topic can be interpreted as a distribution over acoustic words, for unstructured audio information retrieval applications. The proposed supervised acoustic topic model is based on supervised latent Dirichlet allocation (sLDA) while the conventional acoustic topi...
متن کاملStudy of entity-topic models for OOV proper name retrieval
Retrieving Proper Names (PNs) relevant to an audio document can improve speech recognition and content based audio-video indexing. Latent Dirichlet Allocation (LDA) topic model has been used to retrieve Out-Of-Vocabulary (OOV) PNs relevant to an audio document with good recall rates. However, retrieval of OOV PNs using LDA is affected by two issues, which we study in this paper: (1) Word Freque...
متن کاملMulti Domain Semantic Information Retrieval Based on Topic Model
Over the last decades, there have been remarkable shifts in the area of Information Retrieval (IR) as huge amount of information is increasingly accumulated on the Web. The gigantic information explosion increases the need for discovering new tools that retrieve meaningful knowledge from various complex information sources. Thus, techniques primarily used to search and extract important informa...
متن کاملTensor Decomposition for Topic Models: An Overview and Implementation
The goal of a topic model is to characterize observed data in terms of a much smaller set of unobserved topics. Topic models have proven especially popular for information retrieval. Latent Dirichlet Allocation (LDA) is the most popular generative model used for topic modeling. Learning the optimal parameters of the LDA model efficiently, however, is an open question. As [2] point out, the trad...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012