Indexing The World Wide Web: The Journey So Far
نویسندگان
چکیده
In this chapter, we describe the key indexing components of today’s web search engines. As the World Wide Web has grown, the systems and methods for indexing have changed significantly. We present the data structures used, the features extracted, the infrastructure needed, and the options available for designing a brand new search engine. We highlight techniques that improve relevance of results, discuss trade-offs to best utilize machine resources, and cover distributed processing concept in this context. In particular, we delve into the topics of indexing phrases instead of terms, storage in memory vs. on disk, and data partitioning. We will finish with some thoughts on information organization for the newly emerging data-forms.
منابع مشابه
Weaving a Web of linked resources
This editorial introduces the special issue based on the best papers from ESWC 2015. And since ESWC’15 marked 15 years of Semantic Web research, we extended this editorial to a position paper that reflects the path that we, as a community, traveled so far with the goal of transforming the Web of Pages to a Web of Resources. We discuss some of the key challenges, research topics and trends addre...
متن کاملSuitability of Signature Indexing Over the World Wide Web
Signature indexing has been studied extensively in text database or other databases for many years. The main advantages of a signature le as an access index are its small size, distributability, the ability to index information of a wide variety of types, ease of maintenance, and the ability to provide fuzzy indexing. These features are precisely what are needed for a good access index for inde...
متن کاملContext Based Indexing On Synonym System Using Hierarchical Clustering In Web Mining
Now a days, the World Wide Web is the collection of large amount of information which is increasing day by day. For this increasing amount of information, there is a need for efficient and effective indexing structure. Indexing in search engines has become the major issue for improving the performance of Web search engines, so that the most relevant web documents are retrieved in minimum possib...
متن کاملSearch Engine using Apache Lucene
The World-Wide Web is a huge network of billions of workstations and this network contains billions of web pages containing information on a wide variety of topics. There are a lot of topics discussed by people, opinions and suggestions shared on various social networking sites that the users are interested in. Low precision and low recall still exists in the current search engines. So a search...
متن کاملA Unified Approach to Indexing Multimedia on the Web
Indexing multimedia Web documents can be regarded as an important part of Web engineering, a concept first proposed [19] by one of the authors and his collaborators in 1998 at the World Wide Web WWW7 conference in Brisbane, Australia. Contentbased indexing of multimedia has always been a challenging task. The enormity and diversity of the multimedia content on the World Wide Web (WWW) adds anot...
متن کامل