A Novel Method for Crawler in Domain-specific Search
نویسندگان
چکیده
A focused crawler is a Web crawler aiming to search and retrieve Web pages from the World Wide Web, which are related to a domain-specific topic. Rather than downloading all accessible Web pages, a focused crawler analyzes the frontier of the crawled region to visit only the portion of the Web that contains relevant Web pages, and at the same time, try to skip irrelevant regions. In this paper, we present a new crawling strategy that employed the C4.5 decision tree to predict the relevance of a link target, and combined the algorithm with the link characteristic of parent pages. Experimental results indicate that the new crawling method has better performance, and it was able to fetch higher topic relevant information.
منابع مشابه
Prioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملSearch optimization technique for Domain Specific Parallel Crawler
Architectural framework of World Wide Web is used for accessing linked documents spread out over millions of machines all over the Internet. Web is a system that makes exchange of data on the internet easy and efficient. Due to the exponential growth of web, it has become a challenge to traverse all URLs in the web documents and handle these documents, so it is necessary to optimize the paralle...
متن کاملA Novel Hybrid Focused Crawling Algorithm to Build Domain-Specific Collections
The Web, containing a large amount of useful information and resources, is expanding rapidly. Collecting domain-specific documents/information from the Web is one of the most important methods to build digital libraries for the scientific community. Focused Crawlers can selectively retrieve Web documents relevant to a specific domain to build collections for domain-specific search engines or di...
متن کاملAn Improved Technique for Web Page Classification in Respect of Domain Specific Search
A domain specific crawler, as diverse from a general web search engine, focuses on a specific segment of web content. They are also called vertical or topical search engines. Common vertical search engines are meant for shopping, automotive industry, legal information, medical information, scholarly literature, and travel. Examples of vertical search engines are Trulia. com, Mocavo. com and Yel...
متن کاملAn Effective Focused Web Crawler for Web Resource Discovery
In the given volume of the Web and its speed of change, the coverage of modern search engines is relatively small. Web crawling is the process used by search engines to collect pages from the Web. Therefore, collecting domain-specific information from the Web is a special theme of research in many papers. In this paper, we introduce a new effective focused web crawler. It uses smart methods to ...
متن کامل