Augmentation of a Term/Document Matrix with Part-of- Speech Tags to Improve Accuracy of Latent Semantic Analysis

نویسندگان

  • TOM RISHEL
  • A. LOUISE PERKINS
  • SUMANTH YENDURI
  • FARNAZ ZAND
چکیده

We consider the improvement in accuracy of latent semantic analysis when a part of speech tagger is used to augment a term/document matrix. We first construct an augmented term/document matrix as input into singular value decomposition (SVD). The singular values then serve as principal components for a cosine projection. The results show that the addition of POS tags can decrease ambiguities significantly. Key-Words: Latent Semantic Analysis, Documents, Tags, Singular Value Decomposition

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Studying impressive parameters on the performance of Persian probabilistic context free grammar parser

In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...

متن کامل

A Hybrid Method of Syntactic Feature and Latent Semantic Analysis for Automatic Arabic Essay Scoring

Background: The process of automated essays assessments is a challenging task due to the need of comprehensive evaluation in order to validate the answers accurately. The challenge increases when dealing with Arabic language where, morphology, semantic and syntactic are complex. Methodology: There are few research efforts have been proposed for Automatic Essays Scoring (AES) in Arabic. However,...

متن کامل

Applying Part-of-Seech Enhanced LSA to Automatic Essay Grading

Latent Semantic Analysis (LSA) is a widely used Information Retrieval method based on " bag-of-words " assumption. However, according to general conception, syntax plays a role in representing meaning of sentences. Thus, enhancing LSA with part-of-speech (POS) information to capture the context of word occurrences appears to be theoretically feasible extension. The approach is tested empiricall...

متن کامل

An Investigation of Recursive Auto-associative Memory in Sentiment Detection

The rise of blogs, forums, social networks and review websites in recent years has provided very accessible and convenient platforms for people to express thoughts, views or attitudes about topics of interest. In order to collect and analyse opinionated content on the Internet, various sentiment detection techniques have been developed based on an integration of part-of-speech tagging, negation...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006