Crawler Indexing using Tree Structure and its Implementation

نویسندگان

Deepika Sharma

Parul Gupta

Changshang Zhou

Wei Ding

Na Yang

Fabrizio Silvestri

Raffaele Perego

چکیده

The plentiful content of the World-Wide Web is useful to millions. Information seekers use a search engine such as Google, Yahoo etc to begin their Web activity. Our aim is to make a search tool that is cost-effective, efficient, fast and user friendly. In response to a query, it should retrieve the most relevant information which has been stored into the database. It should also be portable, so that it can easily be deployed at any platform without any cost and inconvenience. Our goal is to make a Web Search Engine that will retrieve the best matched WebPages in the shortest possible time. This paper proposes an algorithm for crawler in which crawler crawls the WebPages recursively and stores the relevant data in the database. The algorithm uses the basic principles of tree structure while maintaining the crawled data by the crawler to be used by the search engine. The proposed work makes the searching on the web more efficient. It uses the tree/node structure in the database which filters the searched word more efficiently and gives faster results to the user. The paper has also implemented the crawler indexing with tree structure using HTML based Update File at Web Server‟ while making the crawling and searching more efficient.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Context based Web Indexing for Storage of Relevant Web Pages

A focused crawler downloads web pages that are relevant to a user specified topic. The downloaded documents are indexed with a view to optimize speed and performance in finding relevant documents for a search query at the search engine side. However, the information will be more relevant if the context of the topic is also made available to the retrieval system. This paper proposes a technique ...

متن کامل

MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages

In this paper, the authors discuss the MapReduce implementation of crawler, indexer and ranking algorithms in search engines. The proposed algorithms are used in search engines to retrieve results from the World Wide Web. A crawler and an indexer in a MapReduce environment are used to improve the speed of crawling and indexing. The proposed ranking algorithm is an iterative method that makes us...

متن کامل

Context Based Indexing in Search Engines Using Ontology: Review

Nowadays, the World Wide Web is the collection of large amount of information which is increasing day by day. For this increasing amount of information, there is a need for efficient and effective index structure. The main aim of search engines is to provide most relevant documents to the users in minimum possible time. This paper proposes the indexing structure in which index is built on the b...

متن کامل

Focused Crawling Using Latent Semantic Indexing - An Application for Vertical Search Engines

Vertical search engines and web portals are gaining ground over the general-purpose engines due to their limited size and their high precision for the domain they cover. The number of vertical portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we develop a latent semantic indexing classifier that combines link ana...

متن کامل

Combining Text and Link Analysis for Focused Crawling

The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain specific web documents. We compare its efficiency with other well-known web information r...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Crawler Indexing using Tree Structure and its Implementation

نویسندگان

چکیده

منابع مشابه

Context based Web Indexing for Storage of Relevant Web Pages

MapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages

Context Based Indexing in Search Engines Using Ontology: Review

Focused Crawling Using Latent Semantic Indexing - An Application for Vertical Search Engines

Combining Text and Link Analysis for Focused Crawling

عنوان ژورنال:

اشتراک گذاری