Development of Web Crawler for Archiving Web Resources

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Building and archiving event web collections: A focused crawler approach

In this paper, we present a new approach for building and archiving web collections about events. Our approach combines the traditional focused crawling technique with event modeling and representation.

متن کامل

GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources

With the advance of the World-Wide Web (WWW) technology, people can easily share content on the Web, including geospatial data and web services. Thus, the “big geospatial data management” issues start attracting attention. Among the big geospatial data issues, this research focuses on discovering distributed geospatial resources. As resources are scattered on the WWW, users cannot find resource...

متن کامل

Web Crawler: Extracting the Web Data

Internet usage has increased a lot in recent times. Users can find their resources by using different hypertext links. This usage of Internet has led to the invention of web crawlers. Web crawlers are full text search engines which assist users in navigating the web. These web crawlers can also be used in further research activities. For e.g. the crawled data can be used to find missing links, ...

متن کامل

World Wide Web Crawler

We describe our ongoing work on world wide web crawling, a scalable web crawler architecture that can use resources distributed world-wide. The architecture allows us to use loosely managed compute nodes (PCs connected to the Internet), and may save network bandwidth significantly. In this poster, we discuss why such architecture is necessary, point out difficulties in designing such architectu...

متن کامل

Reinforcement-Based Web Crawler

This paper presents a focused web crawler system which automatically creates a minority language corpora. The system uses a database of relevant and irrelevant documents testing the relevance of retrieved web documents. The system requires a starting web document to indicate where the search would begin.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The Journal of the Korea Contents Association

سال: 2011

ISSN: 1598-4877

DOI: 10.5392/jkca.2011.11.9.009