Lexical and semantic clustering by Web links
نویسنده
چکیده
Recent Web searching and mining tools are combining text and link analysis to improve ranking and crawling algorithms. The central assumption behind such approaches is that there is a correlation between the graph structure of the Web and the text and meaning of pages. Here I formalize and quantitatively validate two conjectures drawing connections from linkage information to lexical and semantic Web content. The link-content conjecture states that a page is similar to the pages that link to it, and the link-cluster conjecture that pages about the same topic are clustered together. These results shed light on the connection between the success of the latest Web mining techniques and the small world topology of the Web. From a normative perspective, the present analysis suggests that smart Web crawlers can take advantage of local text and link clues to locate relevant pages in a more efficient and scalable way than current search engine robots.
منابع مشابه
Centralized Clustering Method To Increase Accuracy In Ontology Matching Systems
Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...
متن کاملUse of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملFinding Community Base on Web Graph Clustering
Search Pointers organize the main part of the application on the Internet. However, because of Information management hardware, high volume of data and word similarities in different fields the most answers to the user s’ questions aren`t correct. So the web graph clustering and cluster placement in corresponding answers helps user to achieve his or her intended results. Community (web communit...
متن کاملEnhancing Navigability in Websites Built Using Web Content Management Systems
Websites built using Web Content Management Systems (WCMSs) usually provide their users with three alternative access structures to surf their contents: indexes of categories, breadcrumb trails, and sitemaps. In addition, to find contents of his/her interest, a user can perform more or less advanced full-text searches. In this paper we propose an automatic approach to extend the navigation stru...
متن کاملSemantic Suffix Tree Clustering
This paper proposes a new algorithm, called Semantic Suffix Tree Clustering (SSTC), to cluster web search results containing semantic similarities. The distinctive methodology of the SSTC algorithm is that it simultaneously constructs the semantic suffix tree through an on-depth and on-breadth pass by using semantic similarity and string matching. The semantic similarity is derived from the Wor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 55 شماره
صفحات -
تاریخ انتشار 2004