Very low-dimensional latent semantic indexing for local query regions
نویسندگان
چکیده
In this paper, we focus on performing LSI on very low SVD dimensions. The results show that there is a nearly linear surface in the local query region. Using low-dimensional LSI on local query region we can capture such a linear surface, obtain much better performance than VSM and come comparably to global LSI. The surprisingly small requirements of the SVD dimension resolve the computation restrictions. Moreover, on the condition that several relevant sample documents are available, application of low-dimensional LSI to these documents yielded comparable IR performance to local RF but in a different manner.
منابع مشابه
Query expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملLatent Semantic Indexing (LSI) and TREC-2
Latent Semantic Indexing (LSI) is an extension of the vector retrieval method (e.g., Salton & McGill, 1983) in which the dependencies between terms are explicitly taken into account in the representation and exploited in retrieval. This is done by simultaneously modeling all the interrelationships among terms and documents. We assume that there is some underlying or "latent" structure in the pa...
متن کاملSupervised Semantic Indexing for Ranking Documents
Ranking text documents given a query is one of the key tasks in information retrieval. Typical solutions include classical vector space models using weighted word counts and the cosine similarity (TFIDF) with no machine learning at all, or Latent Semantic Indexing (LSI) using unsupervised learning to learn a low dimensional space of “latent concepts” via a reconstruction objective. The former a...
متن کاملApproximate Dimension Reduction at NTCIR
We carried out a comparison of cross-language retrieval methods on the NTCIR-1 data based on dimension reduction (latent semantic indexing). These methods all use a collection parallel documents (translations or approximate translations) and very little, if any, linguistic knowledge. In NTCIR-1, we compared latent semantic indexing, local LSI, and approximate dimensional equalization (ADE). We ...
متن کاملLatent Semantic Indexing with a Variable Number of Orthogonal Factors
We seek insight into Latent Semantic Indexing by establishing a method to identify the optimal number of factors in the approximation matrix. We define some reasonable property for the approximation to hold and derive a new, un-parametric query expansion method. Extensive numerical experiments confirm the value of the new method.
متن کامل