The World Wide Web
نویسندگان
چکیده
The World Wide Web is a very large distributed digital information space. From its origins in 1991 as an organization-wide collaborative environment at CERN for sharing research documents in nuclear physics, the Web has grown to encompass diverse information resources: personal home pages; online digital libraries; virtual museums; product and service catalogs; government information for public dissemination; research publications; and Gopher, FTP, Usenet news, and mail servers. Some estimates suggest that the Web currently includes about 150 million pages and that this number doubles every four months. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte-size precompiled Web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not necessarily appear at the top of the query output order. Few details concerning system architectures, retrieval models, and query-execution strategies are available for commercial search tools. The cause of preserving proprietary information has promulgated the view that developing Web search tools is esoteric rather than rational. In this article, we hope to promote innovative research and development in this area by offering a systematic perspective on the progress and challenges in searching the Web. Effective search and
منابع مشابه
Prioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملبررسی بهره گیری پژوهش گران دانشگاه تربیت مدرس از وب جهانگستر ‹World Wide Web›
در این پژوهش سعی شده اهداف وانواع اطلاعات مورد استفاده ، روش ها وکانال های دستیابی به اطلاعات ،نحوه ی آشنایی وموانع ومشکلات پژوهش گران دانشگاه تربیت مدرس در رابطه با وب جهان گستر مورد بررسی قرار گیرد. اساسی ترین مشکلات آنها در دستیابی به اطلاعات وب جهانگستر عبارت از ترافیک شبکه ،نامناسب بودن خطوط ارتباطی ودرنتیجه پایین بودن سرعت دسترسی به اطلاعات است که مهم ترین پیشنهاد آنها دراین زمینه ،بهبود ...
متن کاملA Technique for Improving Web Mining using Enhanced Genetic Algorithm
World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...
متن کاملHypertextuality of the Slovenian World Wide Web
The substantial concern of this article is a question to what extent does the contemporary World Wide Web as an information retrieval system reflect key attributes of ideal hypertextual systems. The topic is relevant, since in the literature notions of hypertext and hypertextual systems are accompanied with strong implications not only for the ease and efficacy of access to information, but als...
متن کاملWorld Wide Web Crawler
We describe our ongoing work on world wide web crawling, a scalable web crawler architecture that can use resources distributed world-wide. The architecture allows us to use loosely managed compute nodes (PCs connected to the Internet), and may save network bandwidth significantly. In this poster, we discuss why such architecture is necessary, point out difficulties in designing such architectu...
متن کامل