HITS algorithm improvement using semantic text portion
نویسندگان
چکیده
Kleinberg’s Hypertext-Induced Topic Selection (HITS) algorithm is a popular and effective algorithm to rank web pages. One of its problems is the topic drift problem. Previous researches have tried to solve this problem using anchor-related text. In this paper, we investigate the effectiveness of using Semantic Text Portion for improving the HITS algorithm. In detail, we examine the degree to which we can improve the HITS algorithm. We also compare STPs with other kinds of anchorrelated text from the viewpoint of improving the HITS algorithm. The experimental results demonstrate that the use of STPs is best for improving the HITS algorithm.
منابع مشابه
A Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملA Semantic Approach to Enhance HITS Algorithm for Extracting Associated Concepts using ConceptNet
Common sense knowledge base creation and usage is an active field of research, many researches in this field are trying to invent new methods to make machines more intelligent. ConceptNet is one of the most popular freely available common sense knowledge bases with millions of common concepts and assertions. In this paper, we propose a novel algorithm called SMHITS to extract associated concept...
متن کاملAn Improvement in Support Vector Machines Algorithm with Imperialism Competitive Algorithm for Text Documents Classification
Due to the exponential growth of electronic texts, their organization and management requires a tool to provide information and data in search of users in the shortest possible time. Thus, classification methods have become very important in recent years. In natural language processing and especially text processing, one of the most basic tasks is automatic text classification. Moreover, text ...
متن کاملWeb Image Semantic Clustering
This paper provides a novel Web image clustering methodology based on their associated texts. In our approach, the semantics of Web images are firstly represented into vectors of term-weight pairs. In order to correctly correlate terms to a Web image, the associated text of the Web image is partitioned into semantic blocks according to the semantic structure of the text with respect to the Web ...
متن کاملAn Empirical Approach to Conceptual Case Frame Acquisition
Conceptual natural language processing systems usually rely on case frame instantiation to recognize events and role objects in text. But generating a good set of case frames for a domain is timeconsuming, tedious, and prone to errors of omission. We have developed a corpus-based algorithm for acquiring conceptual case frames empirically from unannotated text. Our algorithm builds on previous r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Web Intelligence and Agent Systems
دوره 8 شماره
صفحات -
تاریخ انتشار 2010