Representation of Web Data in A Web Warehouse

نویسندگان

  • Sourav S. Bhowmick
  • Sanjay Kumar Madria
  • Wee Keong Ng
چکیده

We believe that, to manage Web data effectively, there is a need to build a data warehouse of Web data, i.e. a Web warehouse. In this paper, we focus on how to represent and store relevant hyperlinked Web documents effectively in a Web warehouse called WHOWEDA (WareHouse Of WEb DAta) for further querying and manipulation. We present a simple and general model for representing metadata, structure and content of Web documents and hyperlinks in WHOWEDA. We discuss node and link objects which are used to represent Web documents and hyperlinks respectively in WHOWEDA. These objects are first class objects in our data model called WHOM (WareHouse Object Model) which is designed to represent and manipulate Web data in the warehouse. An important feature of our model is that it represents metadata, content and structure as trees called node and link metadata trees, and node and link data trees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Semantic Schema Matching Approach

Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...

متن کامل

Change Detection and Maintenance of an XML Web Warehouse

The World Wide Web contains a huge and increasing volume of information. The web warehouse is an efficient and effective means to facilitate utilization of information on the Web, not only to individual users but also to business organizations, especially for decision-making purposes. On the other hand, XML has recently become the new standard for representation and exchange of data on the Web....

متن کامل

Effective Learning to Rank Persian Web Content

Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a ...

متن کامل

A procedure for Web Service Selection Using WS-Policy Semantic Matching

In general, Policy-based approaches play an important role in the management of web services, for instance, in the choice of semantic web service and quality of services (QoS) in particular. The present research work illustrates a procedure for the web service selection among functionality similar web services based on WS-Policy semantic matching. In this study, the procedure of WS-Policy publi...

متن کامل

Change-Centric Management of Versions in an XML Warehouse

We present a change-centric method to manage versions in a Web WareHouse of XML data. The starting points is a sequence of snapshots of XML documents we obtain from the web. By running a diff algorithm, we compute the changes between two consecutive versions. We then represent the sequence using a novel representation of changes based on completed deltas and persistent identifiers. We present t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. J.

دوره 46  شماره 

صفحات  -

تاریخ انتشار 2003