Kou, Wanqiu, Li Fang and Timothy Baldwin (to appear) Automatic Labelling of Topic Models using Word Vectors and Letter Trigram Vectors, in Proceedings of the Eleventh Asian Information Retrieval Societies Conference (AIRS 2015), Brisbane, Australia
نویسندگان
چکیده
The native representation of LDA-style topics is a multinomial distributions over words, which can be time-consuming to interpret directly. As an alternative representation, automatic labelling has been shown to help readers interpret the topics more efficiently. We propose a novel framework for topic labelling using word vectors and letter trigram vectors. We generate labels automatically and propose automatic and human evaluations of our method. First, we use a chunk parser to generate candidate labels, then map topics and candidate labels to word vectors and letter trigram vectors in order to find which candidate label is more semantically related to that topic. A label can be found by calculating the similarity between a topic and its candidate label vectors. Experiments on three common datasets show that not only the labelling method, but also out approach to automatic evaluation is effective.
منابع مشابه
Automatic Labelling of Topic Models Using Word Vectors and Letter Trigram Vectors
The native representation of LDA-style topics is a multinomial distributions over words, but automatic labelling of such topics has been shown to help readers interpret the topics better. We propose a novel framework for topic labelling using word vectors and letter trigram vectors. We generate labels automatically and propose automatic and human evaluations of our method. First, we use a chunk...
متن کاملWang, Li, Su Nam Kim and Timothy Baldwin (to appear) The Utility of Discourse Structure in Forum Thread Retrieval, In Proceedings of the Ninth Asian Information Retrieval Societies Conference (AIRS 2013), Singapore
Web user forums are a valuable means for users to resolve specific information needs, both interactively for the participants and statically for users who search/browse over historical thread data. However, the complex structure of forum threads can make it difficult for users to extract relevant information. Information retrieval (IR) over forum threads is one important way to obtain useful in...
متن کاملA Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features
Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...
متن کاملA Novel Method for Content Base Image Retrieval Using Combination of Local and Global Features
Content-based image retrieval (CBIR) has been an active research topic in the last decade. In this paper we proposed an image retrieval method using global and local features. Firstly, for local features extraction, SURF algorithm produces a set of interest points for each image and a set of 64-dimensional descriptors for each interest points and then to use Bag of Visual Words model, a cluster...
متن کاملBig Data Small Data, In Domain Out-of Domain, Known Word Unknown Word: The Impact of Word Representations on Sequence Labelling Tasks
Word: The Impact of Word Representation on Sequence Labelling Tasks Lizhen Qu, Gabriela Ferraro, Liyuan Zhou, Weiwei Hou, Nathan Schneider and Timothy Baldwin 1 NICTA, ACT 2601, Australia 2 The Australian National University 3 The University of Melbourne, VIC 3010, Australia 4 University of Edinburgh, EH8 9AB, UK. {lizhen.qu,gabriela.ferraro,liyuan.zho,weiwei.hou}@nicta.com.au [email protected]...
متن کامل