crawler

Focused Crawling

2007

Krishnan Suresh Sivaramakrishnan Kaveri

Focused crawling is an efficient mechanism for discovering resources of interest on the web. Link structure is an important property of the web that defines its content. In this thesis, FOCUS a novel focused crawler is described, which primarily uses the link structure of the web in its crawling strategy. It uses currently available search engine APIs, provided by Google, to construct a layered...

متن کامل

Topic-Specific YouTube Crawling to Detect Online Radicalization

2015

Swati Agarwal Ashish Sureka

Online video sharing platforms such as YouTube contains several videos and users promoting hate and extremism. Due to low barrier to publication and anonymity, YouTube is misused as a platform by some users and communities to post negative videos disseminating hatred against a particular religion, country or person. We formulate the problem of identification of such malicious videos as a search...

متن کامل

Towards a Keyword-Focused Web Crawler

2013

Tomasz Kusmierczyk Marcin Sydow

This paper concerns predicting the content of textual web documents based on features extracted from web pages that link to them. It may be applied in an intelligent, keyword-focused web crawler. The experiments made on publicly available real data obtained from Open Directory Project with the use of several classification models are promising and indicate potential usefulness of the studied ap...

متن کامل

Reinforcement Learning with Classifier Selection for Focused Crawling

2008

Ioannis Partalas Georgios Paliouras Ioannis P. Vlahavas

Focused crawlers are programs that wander in the Web, using its graph structure, and gather pages that belong to a specific topic. The most critical task in Focused Crawling is the scoring of the URLs as it designates the path that the crawler will follow, and thus its effectiveness. In this paper we propose a novel scheme for assigning scores to the URLs, based on the Reinforcement Learning (R...

متن کامل

Focused Crawling: A New Approach to Topic-Specific Resource Discovery∗

1998

Soumen Chakrabarti Martin van den Berg Byron Dom

The rapid growth of the world-wide web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext information management system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified not using keywords, but using exem...

متن کامل

Efficiency and Cost Analysis of Forestry Machinery Usage in Hyrcanian Forests of Iran

2013

Aidin Parsakhoo Seyed Ataollah Hosseini Majid Lotfalian Hamid Jalilvand

The operating hourly cost of the machine is a suitable factor to analyze the cost fluctuation certain machinery in a changing environment and to find of economically feasible work concepts for the studied machine system. This paper, which is based on studies carried out in northern forests of IR-Iran, analyzed and compared costs of four skidding and excavation machines used in timber harvesting...

متن کامل

An Approach for Identifying URLs Based on Division Score and Link Score in Focused Crawler

2010

Debashis Hati Amritesh Kumar A. Pal D. S. Tomar

The rapid growth of the World Wide Web (WWW) poses unprecedented scaling challenges for general-purpose crawlers. Crawlers are software which can traverse the internet and retrieve web pages by hyperlinks. The focused crawler of a special-purpose search engine aims to selectively seek out pages that are relevant to a pre-defined set of topics, rather than to exploit all regions of the Web. Focu...

متن کامل

A New Approach Towards Vertical Search Engines - Intelligent Focused Crawling and Multilingual Semantic Techniques

2010

Sybille Peters Claus-Peter Rückemann Wolfgang Sander-Beuermann

Search engines typically consist of a crawler which traverses the web retrieving documents and a search frontend which provides the user interface to the acquired information. Focused crawlers refine the crawler by intelligently directing it to predefined topic areas. The evolution of search engines today is expedited by supplying more search capabilities such as a search for metadata as well a...

متن کامل

Hybrid focused crawling on the Surface and the Dark Web

Journal: :EURASIP J. Information Security 2017

Christos Iliou George Kalpakis Theodora Tsikrika Stefanos Vrochidis Yiannis Kompatsiaris

Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of interest. This work proposes a generic focused crawling framework for discovering resources on any given topic that reside on the Surface or the Dark Web. The proposed ...

متن کامل

Crawling Microblog by Common-Designed Software

2013

Gang Lu Shumei Liu Kevin Lü

Amount of microblogs data is needed to be crawled for research, business analyzing, and so on. However, a lot of dynamic Web techniques are used in microblog Web pages. That makes it hard to crawl data by parsing the contents of Web pages for traditional Web page crawlers. Fortunately, microblogs provide APIs. Well-structured data can be returned to users simply by accessing those APIs in form ...

متن کامل