Putting the World Wide Web into a Data Warehouse: A DWH-Based Approach to Web Analysis
نویسندگان
چکیده
The World Wide Web, due to its sheer size and dynamics, has turned into one of the most fascinating and important data sources for large-scale analysis and investigation, ranging from content-based information location, dynamics of change, to community analysis. Yet, most projects so far rely on special-purpose tools optimized for a given task, providing only limited flexibility. In this paper we propose a Data Warehouse based approach to analyze the World Wide Web. Information contained in the web pages, meta data on the documents, as well as information acquired from additional sources such as the WHOIS database, are integrated into a multidimensional view of the Web. The resulting system allows for flexible analysis of the various characteristics of the Web. Results from a prototypical study of the Austrian national Web space as part of the AOLA project demonstrate the potential of the
منابع مشابه
Prioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملChange Detection and Maintenance of an XML Web Warehouse
The World Wide Web contains a huge and increasing volume of information. The web warehouse is an efficient and effective means to facilitate utilization of information on the Web, not only to individual users but also to business organizations, especially for decision-making purposes. On the other hand, XML has recently become the new standard for representation and exchange of data on the Web....
متن کاملExpert Discovery: A web mining approach
Expert discovery is a quest in search of finding an answer to a question: “Who is the best expert of a specific subject in a particular domain within peculiar array of parameters?” Expert with domain knowledge in any field is crucial for consulting in industry, academia and scientific community. Aim of this study is to address the issues for expert-finding task in real-world community. Collabor...
متن کاملEnhanced Architecture of a Web Warehouse based on Quality Evaluation Framework to Incorporate Quality Aspects in Web Warehouse Creation
In the recent years, it has been observed that World Wide Web (www) became a vast source of information explosion about all areas of interest. Relevant information retrieval is difficult from the web space as there is no universal configuration and organization of the web data. Taking the advantage of data warehouse functionality and integrating it with the web to retrieve relevant data is the ...
متن کاملFunctions of a Web Warehouse
This paper proposes a web warehouse based approach to facilitating efficiency improvement, information sharing and service personalization for the World Wide Web. We will overview various functions of a web warehouse by considering the following applications: (1) a web warehouse as shared information repository, (2) a web warehouse as large-scale intelligent cache. We conclude that web warehous...
متن کامل