Graph-Based Generalized Latent Semantic Analysis For Document Representation
نویسندگان
چکیده
Document indexing and representation of term-document relations are very important for document clustering and retrieval. In this paper, we combine a graph-based dimensionality reduction method with a corpus-based association measure within the Generalized Latent Semantic Analysis framework. We evaluate the graph-based GLSA on the document clustering task.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملTerm Representation with Generalized Latent Semantic Analysis
Document indexing and representation of termdocument relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by the recent success of co-occurrence based measures of semantic similarity obtained fr...
متن کاملTerms and Document Representation with Generalized Latent Semantic Analysis
Document indexing and representation of term-document relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by recent success of co-occurrence based measures of semantic similarity obtained from ...
متن کاملQuery expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملStatement for Irina Matveeva
My research interest is to improve natural language applications by developing efficient unsupervised and semi-supervised machine learning approaches. My approach is to design machine learning solutions tailored to specific natural language problems based on an in-depth analysis of their components. I believe that machine learning algorithms are most efficient for language applications if they ...
متن کامل