Clustering Images Using the Latent Dirichlet Allocation Model

نویسندگان

Pradheep K Elango

Karthik Jayaraman

چکیده

Clustering, in simple words, is grouping similar data items together. In the text domain, clustering is largely popular and fairly successful. In this work, we try and apply clustering methods that are used in the text domain, to the image domain. Two major challenges in this approach are image representation and vocabulary definition. We apply the bag-of-words model to images using image segments as words. We use the Latent Dirichlet Allocation (LDA) to model the relationships between “words” of an image, and between images. This provides us with a highly compressed yet succinct representation of an image, which can be further used for various applications like image clustering, image retrieval and image relevance ranking. In this work, we have used the relationships obtained from LDA to cluster the images with 78% success.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...

متن کامل

Legal Documents Clustering using Latent Dirichlet Allocation

At present due to the availability of large amount of legal judgments in the digital form creates opportunities and challenges for both the legal community and for information technology researchers. This development needs assistance in organizing, analyzing, retrieving and presenting this content in a helpful and distributed manner. We propose an approach to cluster legal judgments based on th...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation

A new approach for computing weights of topic models in language model (LM) adaptation is introduced. We formed topic clusters by a hard-clustering method assigning one topic to one document based on the maximum number of words chosen from a topic for that document in Latent Dirichlet Allocation (LDA) analysis. The new weighting idea is that the unigram count of the topic generated by hard-clus...

متن کامل

Topic Models For Feature Selection in Document Clustering

We investigate the idea of using a topic model such as the popular Latent Dirichlet Allocation model as a feature selection step for unsupervised document clustering, where documents are clustered using the proportion of the various topics that are present in each document. One concern with using “vanilla” LDA as a feature selection method for input to a clustering algorithm is that the Dirichl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2005

Clustering Images Using the Latent Dirichlet Allocation Model

نویسندگان

چکیده

منابع مشابه

Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation

Legal Documents Clustering using Latent Dirichlet Allocation

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Novel weighting scheme for unsupervised language model adaptation using latent dirichlet allocation

Topic Models For Feature Selection in Document Clustering

عنوان ژورنال:

اشتراک گذاری