Visualizing polysemy using LSA and the predication algorithm

نویسندگان

  • Guillermo Jorge-Botana
  • José Antonio León
  • Ricardo Olmos
  • Yusef Hassan Montero
چکیده

Context is a determining factor in language, and plays a decisive role in polysemic words. Several psycholinguistically-motivated algorithms have been proposed to emulate human management of context, under the assumption that the value of a word is evanescent and takes on meaning only in interaction with other structures. The predication algorithm (Kintsch, 2001), for example, uses a vector representation of the words produced by LSA (Latent Semantic Analysis) to dynamically simulate the comprehension of predications and even of predicative metaphors. The objective of this study is to predict some unwanted effects that could be present in vector-space models when extracting different meanings of a polysemic word (Predominant meaning inundation, Lack of precision and Low-level definition), and propose ideas based on the predication algorithm for avoiding them. Our first step was to visualize such unwanted phenomena and also the effect of solutions. We use different methods to extract the meanings for a polysemic word (without context, Vector Sum and Predication Algorithm). Our second step was to conduct an ANOVA to compare such methods and measure the impact of potential solutions. Results support the idea that a human-based computational algorithm like the Predication algorithm can take into account features that ensure more accurate representations of the structures we seek to extract. Theoretical assumptions and their repercussions are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-sortal Predication and Polysemy

This paper develops new treatment of the problem of cross-sortal predication and copredication in particular. We argue that the solution to these predicate-argument sort mismatches can be solved by a more flexible treatment of polysemy based on the notion of dependent type and dynamic construction of meaning.

متن کامل

Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis

Probabilistic Latent Semantic Analysis (PLSA) models have been shown to provide a better model for capturing polysemy and synonymy than Latent Semantic Analysis (LSA). However, the parameters of a PLSA model are trained using the Expectation Maximization (EM) algorithm, and as a result, the trained model is dependent on the initialization values so that performance can be highly variable. In th...

متن کامل

'surfing for knowledge' finding semantically similar Web clusters

In this paper we present our technique for finding semantically similar clusters within web documents obtained from a set of queries retrieved from the Google search engine. This technique utilizes a clustering algorithm based on previous Latent Semantic Analysis (LSA) work pioneered by Deerwester. In this paper we demonstrate how by using our clustering algorithm we can resolve ambiguities pre...

متن کامل

Application of Genetic Algorithm in Development of Bankruptcy Predication Theory Case Study: Companies Listed on Tehran Stock Exchange

The bankruptcy prediction models have long been proposedas a key subject in finance. The present study, therefore, makes aneffort to examine the corporate bankruptcy prediction through employmentof the genetic algorithm model. Furthermore, it attempts to evaluatethe strategies to overcome the drawbacks of ordinary methods forbankruptcy prediction through application of genetic algorithms. Thesa...

متن کامل

Using latent semantic analysis and the predication algorithm to improve extraction of meanings from a diagnostic corpus.

There is currently a widespread interest in indexing and extracting taxonomic information from large text collections. An example is the automatic categorization of informally written medical or psychological diagnoses, followed by the extraction of epidemiological information or even terms and structures needed to formulate guiding questions as an heuristic tool for helping doctors. Vector spa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 61  شماره 

صفحات  -

تاریخ انتشار 2010