Hybrid Focused Crawler - a Fast Retrieval of Topic Related Web Resource for Domain Specific Searching

نویسندگان

  • Achintya Das
  • Sudarshan Nandy
چکیده

The up-to-date web-world has become more complex in size and information relating to various aspects within its sphere. The human being is now in a cultural habit of searching the web for information. Search engine is also one of the techniques which helps the human empirical nature. Crawling is a procedure through which search engine crawls the web, and stores the necessary document and their corresponding URL in the back end. The work presented here is all about a development of a crawler which is hierarchical in nature even by maintaining parallelism during classification and analysis of the web page. This particular mechanism makes it faster than normal focused crawler used today. It is also based on the principle of the focused crawler which is also known as a topic driven crawler. With this embedded principle the hybrid crawler has the capability to search particular link related to the required query.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

A Model of Hybrid Genetic Algorithm-particle Swarm Optimization(hgapso) Based Query Optimization for Web Information Retrieval

The rapid growth of web pages available on the Internet recently, searching relevant and up-to-date information has become a crucial issue. Information retrieval is one of the most crucial components in search engines and their optimization would have a great effect on improving the searching efficiency due to dynamic nature of web it becomes harder to find relevant and recent information. That...

متن کامل

Focused Crawling System based on Improved LSI

In this research work we have developed a semi-deterministic algorithm and a scoring system that takes advantage of the Latent Semantic indexing scoring system for crawling web pages that belong to particular domain or is specific to the topic .The proposed algorithm calculates a preference factor in addition to the LSI score to determine which web page needs to preferred for crawling by the mu...

متن کامل

A Novel Method for Crawler in Domain-specific Search

A focused crawler is a Web crawler aiming to search and retrieve Web pages from the World Wide Web, which are related to a domain-specific topic. Rather than downloading all accessible Web pages, a focused crawler analyzes the frontier of the crawled region to visit only the portion of the Web that contains relevant Web pages, and at the same time, try to skip irrelevant regions. In this paper,...

متن کامل

A Grid Focused Community Crawling Architecture for Medical Information Retrieval Services

This paper describes a GRID focused community crawling architecture and its possible adoption in a medical information domain. This architecture has been designed for handling a retrieval information service to individuals that are entitled to access the highly distributed computational power of the GRID, eliminating the need of a central authority/repository such as a unique search engine. In ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010