Thematically Reinforced Explicit Semantic Analysis

نویسندگان

  • Yannis Haralambous
  • Vitaly Klyuev
چکیده

We present an extended, thematically reinforced version of Gabrilovich and Markovitch’s Explicit Semantic Analysis (ESA), where we obtain thematic information through the category structure of Wikipedia. For this we first define a notion of categorical tfidf which measures the relevance of terms in categories. Using this measure as a weight we calculate a maximal spanning tree of the Wikipedia corpus considered as a directed graph of pages and categories. This tree provides us with a unique path of “most related categories” between each page and the top of the hierarchy. We reinforce tfidf of words in a page by aggregating it with categorical tfidfs of the nodes of these paths, and define a thematically reinforced ESA semantic relatedness measure which is more robust than standard ESA and less sensitive to noise caused by out-of-context words. We apply our method to the French Wikipedia corpus, evaluate it through a text classification on a 37.5 MB corpus of 20 French newsgroups and obtain a precision increase of 9–10% compared with standard ESA.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite Element Analysis of Low Velocity Impact on Carbon Fibers/Carbon Nanotubes Reinforced Polymer Composites

An effort is made to gain insight on the effect of carbon nanotubes (CNTs) on the impact response of carbon fiber reinforced composites (CFRs) under low velocity impact. Certain amount of CNTs could lead improvements in mechanical properties of composites. In the present investigation, ABAQUS/Explicit finite element code (FEM) is employed to investigate various damages modes of nano composites ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

An Investigation of Semantic Cluster Helps Listening Comprehension of English Learners: A Case-study in Pass College

This paper introduces Daneman and Carpenter’s test of working memory span, and taking the use of Chinese language materials collected by Chinese Language Education Research Center’s bilingual corpus does a 3 week’s experiment with both English majors and non-English majors in Pass College of CTBU (Chongqing Technology and Business University). The experimental group has 10 minutes vocabulary cl...

متن کامل

Explicit vs. Contrastive-based Instruction of Formulaic Expressions in Developing EFL Learners’ Reading Ability

 As an integrative component of textual structure, formulaic expressions (FEs) play a key role in communicating the message and comprehending the text. Furthermore, interlingually contrastive features of FEs add to their both significance and complexity of their instruction. Given these facts, this study was an attempt to explore a sound mechanism on how to teach FEs; whether an explicit or CA-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1405.4364  شماره 

صفحات  -

تاریخ انتشار 2013