Lexical and semantic clustering by Web links

نویسنده

  • Filippo Menczer
چکیده

Recent Web searching and mining tools are combining text and link analysis to improve ranking and crawling algorithms. The central assumption behind such approaches is that there is a correlation between the graph structure of the Web and the text and meaning of pages. Here I formalize and quantitatively validate two conjectures drawing connections from linkage information to lexical and semantic Web content. The link-content conjecture states that a page is similar to the pages that link to it, and the link-cluster conjecture that pages about the same topic are clustered together. These results shed light on the connection between the success of the latest Web mining techniques and the small world topology of the Web. From a normative perspective, the present analysis suggests that smart Web crawlers can take advantage of local text and link clues to locate relevant pages in a more efficient and scalable way than current search engine robots.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Centralized Clustering Method To Increase Accuracy In Ontology Matching Systems

Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

Finding Community Base on Web Graph Clustering

Search Pointers organize the main part of the application on the Internet. However, because of Information management hardware, high volume of data and word similarities in different fields the most answers to the user s’ questions aren`t correct. So the web graph clustering and cluster placement in corresponding answers helps user to achieve his or her intended results. Community (web communit...

متن کامل

Enhancing Navigability in Websites Built Using Web Content Management Systems

Websites built using Web Content Management Systems (WCMSs) usually provide their users with three alternative access structures to surf their contents: indexes of categories, breadcrumb trails, and sitemaps. In addition, to find contents of his/her interest, a user can perform more or less advanced full-text searches. In this paper we propose an automatic approach to extend the navigation stru...

متن کامل

Semantic Suffix Tree Clustering

This paper proposes a new algorithm, called Semantic Suffix Tree Clustering (SSTC), to cluster web search results containing semantic similarities. The distinctive methodology of the SSTC algorithm is that it simultaneously constructs the semantic suffix tree through an on-depth and on-breadth pass by using semantic similarity and string matching. The semantic similarity is derived from the Wor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 55  شماره 

صفحات  -

تاریخ انتشار 2004