Towards a Keyword-Focused Web Crawler

نویسندگان

  • Tomasz Kusmierczyk
  • Marcin Sydow
چکیده

This paper concerns predicting the content of textual web documents based on features extracted from web pages that link to them. It may be applied in an intelligent, keyword-focused web crawler. The experiments made on publicly available real data obtained from Open Directory Project with the use of several classification models are promising and indicate potential usefulness of the studied approach in automatically obtaining keyword-rich web document collections.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design of an Agent Based Context Driven Focused Crawler

A focused crawler downloads web pages that are relevant to a user specified topic. Most of the existing focused crawlers are keyword driven and do not take into account the context associated with the keywords. This leads to retrieval of a large number of web pages irrespective of the fact whether they are logically related. Thus, the keyword based strategy alone is not sufficient for the desig...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Context based Web Indexing for Storage of Relevant Web Pages

A focused crawler downloads web pages that are relevant to a user specified topic. The downloaded documents are indexed with a view to optimize speed and performance in finding relevant documents for a search query at the search engine side. However, the information will be more relevant if the context of the topic is also made available to the retrieval system. This paper proposes a technique ...

متن کامل

A Focused Crawler for Borderlands Situation Information with Geographical Properties of Place Names

Place name is an important ingredient of borderlands situation information and plays a significant role in collecting them from the Internet with focused crawlers. However, current focused crawlers treat place name in the same way as any other common keyword, which has no geographical properties. This may reduce the effectiveness of focused crawlers. To solve the problem, this paper firstly dis...

متن کامل

Focusing Web Crawls On Location-Specific Content

Retrieving relevant data for location-sensitive keyword queries is a challenging task that has so far been addressed as a problem of automatically determining the geographical orientation of web searches. Unfortunately, identifying localizable queries is not sufficient per se for performing successful location-sensitive searches, unless there exists a geo-referenced index of data sources agains...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013