Using Filtered Second Order Co-occurrence Matrix to Improve the Traditional Co-occurrence Model

نویسندگان

  • Chun-Hung Lu
  • Chorng-Shyong Ong
  • Wen-Lian Hsu
  • Hsing-Kuo Lee
چکیده

Using co-occurrence statistics to measure word similarities/relatedness has applications in many areas of natural language processing. Our experiment results also indicate that two words with zero co-occurrence statistics could still be related. In this paper, we present two algorithms, both of which were evaluated on 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). The evaluation results show that the first algorithm improves the performance of co-occurrence based applications significantly; and the second ensemble algorithm (which incorporates the first algorithm) achieves the best results on the synonym questions of both tests. Keyword: Co-occurrence, Word similarity, Word relatedness, Synonym test, PMI

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context Dependent Class Language Model based on Word Co-occurrence Matrix in LSA Framework for Speech Recognition

We address the issue of data sparseness problem in language model (LM). Using class LM is one way to avoid this problem. In class LM, infrequent words are supported by more frequent words in the same class. This paper investigates a class LM based on LSA. A word-document matrix is usually used to represent a corpus in LSA framework. However, this matrix ignores word order in the sentence. We pr...

متن کامل

Dedicated Hardware for Real-Time Computation of Second-Order Statistical Features for High Resolution Images

We present a novel dedicated hardware system for the extraction of second-order statistical features from high-resolution images. The selected features are based on gray level co-occurrence matrix analysis and are angular second moment, correlation, inverse difference moment and entropy. The proposed system was evaluated using input images with resolutions that range from 512×512 to 2048×2048 p...

متن کامل

پنهان‌شکنی تصویر براساس ویژگیهای ماتریس ‌هم‌وقوعی

In this paper two novel steganalysis methods is presented based on co-occurrence matrix of an image. It is shown that by using features extracted from this matrix, we can differentiate between cover and stego images. These features include energy, entropy, contrast, inverse difference moment, maximum probability and correlation. We use SVM classification for separation of cover and stego imag...

متن کامل

Second-Order Statistical Texture Representation of Asphalt Pavement Distress Images Based on Local Binary Pattern in Spatial and Wavelet Domain

Assessment of pavement distresses is one of the important parts of pavement management systems to adopt the most effective road maintenance strategy. In the last decade, extensive studies have been done to develop automated systems for pavement distress processing based on machine vision techniques. One of the most important structural components of computer vision is the feature extraction met...

متن کامل

Adapting Image Texture Co-occurrence Analysis for Audio Texture Similarity

In this letter, we adapt a well-known technique of image texture analysis (grey-level co-occurrence matrix) to compute similarity between musical audio signals. Grey-level cooccurrence matrices estimate the joint probability of pairs of pixel values separated by a spatial displacement vector. Instead of using pixel grey-level values, we propose to use frame-based audio features, obtained either...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011