Experiments with LSA Scoring: Optimal Rank and Basis

نویسنده

  • John Caron
چکیده

Latent Semantic Analysis (LSA) is one of the main variants of vector space methods for information retrieval, and continues to be an active area of research, both in the theory of how LSA works and in the practical applications of the method. Alternatives to the Singular Value Decomposition (SVD) have been explored that o er improvements in storage, speed, updating or other advantages, especially for large datasets. Alternatives to constructing the term-document matrix, such as term-weighting schemes have also been explored. At the core of LSA is the scoring of a query against a canonical set of documents, using the inner product of their vector representations. Virtually all LSA researchers have assumed a particular scoring function, and alternatives have not been well investigated. In this paper I de ne a family of scoring functions parameterized by a "weighting exponent" p, which weights the score by , where is the diagonal matrix of SVD eigenvalues. I present empirical ndings on how retrieval accuracy varies as a function of p and the SVD approximation rank k, and nd that often the standard scoring function, corresponding to p = 0, is not optimal. In Section 2, I review the Singular Value Decomposition, and de ne LSA scoring functions as a choice of basis for the vector space de ned by the document set. In Section 3, I test these scoring functions on standard test datasets, for di erent values of the SVD approximation rank k. In Section 4, I present results from a similar experiment on matching question with answers, within the context of the Frequently Asked Question Organizer (FAQO), a prototype system for technical support that I developed. In Section 5, I summarize related work, and discuss the signi cance of these results.

منابع مشابه

Effectiveness of Automated Chinese Sentence Scoring with Latent Semantic Analysis

Automated scoring by means of Latent Semantic Analysis (LSA) has been introduced lately to improve the traditional human scoring system. The purposes of the present study were to develop a LSA-based assessment system to evaluate children’s Chinese sentence construction skills and to examine the effectiveness of LSA-based automated scoring function by comparing it with traditional human scoring....

متن کامل

ESSAY ASSESSMENT 1 Running head: ESSAY ASSESSMENT Essay Assessment with Latent Semantic Analysis

Latent semantic analysis (LSA) is an automated, statistical technique for comparing the semantic similarity of words or documents. In this paper, I examine the application of LSA to automated essay scoring. I compare LSA methods to earlier statistical methods for assessing essay quality, and critically review contemporary essay-scoring systems built on LSA, including the Intelligent Essay Asses...

متن کامل

Parameters Driving Effec- Tiveness of Automated Essay Scoring with Lsa

Automated essay scoring with latent semantic analysis (LSA) has recently been subject to increasing interest. Although previous authors have achieved grade ranges similar to those awarded by humans, it is still not clear which and how parameters improve or decrease the effectiveness of LSA. This paper presents an analysis of the effects of these parameters, such as text preprocessing, weighting...

متن کامل

Rank estimation of trajectory matrix in motion segmentation

A novel technique for estimating the rank of the trajectory matrix in the local subspace affinity (LSA) motion segmentation framework is presented. This new rank estimation is based on the relationship between the estimated rank of the trajectory matrix and the affinity matrix built with LSA. The result is an enhanced model selection technique for trajectory matrix rank estimation by which it i...

متن کامل

Pasteur's Quadrant: Computational Linguistics, LSA, And Education

This paper argues that computational cognitive psychology and computational linguistics have much to offer the science of language by adopting the research strategy that Donald Stokes called Pasteur’s quadrant--starting and testing success with important real world problems--and that education offers an ideal venue. Some putative examples from applications of Latent Semantic Analysis (LSA) are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000