YAFC: Yet Another Focused Crawler

نویسنده

  • George Sakkis
چکیده

As the Web continues to grow rapidly, focused topic-specific Web crawlers will gain popularity over traditional general-purpose search engines for locating, indexing and keeping up to date information on the Web. This paper presents YAFC (Yet Another Focused Crawler), a neurodynamic programming approach to focused crawling. YAFC combines TD(λ) reinforcement learning with a neural network to learn the value function of reaching topic relevant Web pages. A prototype implementation is described, along with some preliminary results on a crawling task of retrieving Web pages on politics among thousands of relevant and irrelevant pages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

A Novel Architecture of Mercator: A Scalable, Extensible Web Crawler with Focused Web Crawler

This Paper described A Novel Architecture of Mercator: A Scalable, Extensible Web Crawler with Focused Web Crawler. We enumerate the major components of any Scalable and Focused Web Crawler and describe the particular components used in this Novel Architecture. We also describe this Novel Architecture support for Extensibility and downloaded user’s support information. We also describe how the ...

متن کامل

Intelligent Event Focused Crawling

There is need for an integrated event focused crawling system to collect Web data about key events. When an event occurs, many users try to locate the most up-todate information about that event. Yet, there is little systematic collecting and archiving anywhere of information about events. We propose intelligent event focused crawling for automatic event tracking and archiving, as well as effec...

متن کامل

A Website Model-Supported Focused Crawler for Search Agents

This paper advocates the use of ontology-supported website models to provide a semantic level solution for a search agent so that it can provide fast, precise, and stable search results. We have based on the technique to develop a focused crawler, which can benefit both user requests and domain semantics. Equipped with this technique, our focused crawler manifests the following interesting feat...

متن کامل

Ontology Driven Focused Crawling of Web Documents

In recent year dynamism of the World Wide Web , the issue of discovering relevant web pages has become an important challenge. Focused crawler aims at selectively seeking out pages that are relevant to a pre-defined set of topics. Most of the current approaches perform syntactic matching, that is, they retrieve documents that contain particular keywords from the user’s query. This often leads t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003