Crawler Indexing using Tree Structure and its Implementation
نویسندگان
چکیده
The plentiful content of the World-Wide Web is useful to millions. Information seekers use a search engine such as Google, Yahoo etc to begin their Web activity. Our aim is to make a search tool that is cost-effective, efficient, fast and user friendly. In response to a query, it should retrieve the most relevant information which has been stored into the database. It should also be portable, so that it can easily be deployed at any platform without any cost and inconvenience. Our goal is to make a Web Search Engine that will retrieve the best matched WebPages in the shortest possible time. This paper proposes an algorithm for crawler in which crawler crawls the WebPages recursively and stores the relevant data in the database. The algorithm uses the basic principles of tree structure while maintaining the crawled data by the crawler to be used by the search engine. The proposed work makes the searching on the web more efficient. It uses the tree/node structure in the database which filters the searched word more efficiently and gives faster results to the user. The paper has also implemented the crawler indexing with tree structure using HTML based Update File at Web Server‟ while making the crawling and searching more efficient.
منابع مشابه
Context based Web Indexing for Storage of Relevant Web Pages
A focused crawler downloads web pages that are relevant to a user specified topic. The downloaded documents are indexed with a view to optimize speed and performance in finding relevant documents for a search query at the search engine side. However, the information will be more relevant if the context of the topic is also made available to the retrieval system. This paper proposes a technique ...
متن کاملMapReduce Based Information Retrieval Algorithms for Efficient Ranking of Webpages
In this paper, the authors discuss the MapReduce implementation of crawler, indexer and ranking algorithms in search engines. The proposed algorithms are used in search engines to retrieve results from the World Wide Web. A crawler and an indexer in a MapReduce environment are used to improve the speed of crawling and indexing. The proposed ranking algorithm is an iterative method that makes us...
متن کاملContext Based Indexing in Search Engines Using Ontology: Review
Nowadays, the World Wide Web is the collection of large amount of information which is increasing day by day. For this increasing amount of information, there is a need for efficient and effective index structure. The main aim of search engines is to provide most relevant documents to the users in minimum possible time. This paper proposes the indexing structure in which index is built on the b...
متن کاملFocused Crawling Using Latent Semantic Indexing - An Application for Vertical Search Engines
Vertical search engines and web portals are gaining ground over the general-purpose engines due to their limited size and their high precision for the domain they cover. The number of vertical portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we develop a latent semantic indexing classifier that combines link ana...
متن کاملCombining Text and Link Analysis for Focused Crawling
The number of vertical search engines and portals has rapidly increased over the last years, making the importance of a topic-driven (focused) crawler evident. In this paper, we develop a latent semantic indexing classifier that combines link analysis with text content in order to retrieve and index domain specific web documents. We compare its efficiency with other well-known web information r...
متن کامل