The World Wide Web

نویسندگان

  • VENKAT N. GUDIVADA
  • WILLIAM I. GROSKY
  • RAJESH KASANAGOTTU
چکیده

The World Wide Web is a very large distributed digital information space. From its origins in 1991 as an organization-wide collaborative environment at CERN for sharing research documents in nuclear physics, the Web has grown to encompass diverse information resources: personal home pages; online digital libraries; virtual museums; product and service catalogs; government information for public dissemination; research publications; and Gopher, FTP, Usenet news, and mail servers. Some estimates suggest that the Web currently includes about 150 million pages and that this number doubles every four months. The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte-size precompiled Web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not necessarily appear at the top of the query output order. Few details concerning system architectures, retrieval models, and query-execution strategies are available for commercial search tools. The cause of preserving proprietary information has promulgated the view that developing Web search tools is esoteric rather than rational. In this article, we hope to promote innovative research and development in this area by offering a systematic perspective on the progress and challenges in searching the Web. Effective search and

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

بررسی بهره گیری پژوهش گران دانشگاه تربیت مدرس از وب جهانگستر ‹World Wide Web›

در این پژوهش سعی شده اهداف وانواع اطلاعات مورد استفاده ، روش ها وکانال های دستیابی به اطلاعات ،نحوه ی آشنایی وموانع ومشکلات پژوهش گران دانشگاه تربیت مدرس در رابطه با وب جهان گستر مورد بررسی قرار گیرد. اساسی ترین مشکلات آنها در دستیابی به اطلاعات وب جهانگستر عبارت از ترافیک شبکه ،نامناسب بودن خطوط ارتباطی ودرنتیجه پایین بودن سرعت دسترسی به اطلاعات است که مهم ترین پیشنهاد آنها دراین زمینه ،بهبود ...

متن کامل

A Technique for Improving Web Mining using Enhanced Genetic Algorithm

World Wide Web is growing at a very fast pace and makes a lot of information available to the public. Search engines used conventional methods to retrieve information on the Web; however, the search results of these engines are still able to be refined and their accuracy is not high enough. One of the methods for web mining is evolutionary algorithms which search according to the user interests...

متن کامل

Hypertextuality of the Slovenian World Wide Web

The substantial concern of this article is a question to what extent does the contemporary World Wide Web as an information retrieval system reflect key attributes of ideal hypertextual systems. The topic is relevant, since in the literature notions of hypertext and hypertextual systems are accompanied with strong implications not only for the ease and efficacy of access to information, but als...

متن کامل

World Wide Web Crawler

We describe our ongoing work on world wide web crawling, a scalable web crawler architecture that can use resources distributed world-wide. The architecture allows us to use loosely managed compute nodes (PCs connected to the Internet), and may save network bandwidth significantly. In this poster, we discuss why such architecture is necessary, point out difficulties in designing such architectu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997