Summarizing Relational Databases

نویسندگان

  • Xiaoyan Yang
  • Cecilia M. Procopiuc
  • Divesh Srivastava
چکیده

Complex databases are challenging to explore and query by users unfamiliar with their schemas. Enterprise databases often have hundreds of inter-linked tables, so even when extensive documentation is available, new users must spend a considerable amount of time understanding the schema before they can retrieve any information from the database. The problem is aggravated if the documentation is missing or outdated, which may happen with legacy databases. In this paper we identify limitations of previous approaches to address this vexing problem, and propose a principled approach to summarizing the contents of a relational database, so that a user can determine at a glance the type of information it contains, and the main tables in which that information resides. Our approach has three components: First, we define the importance of each table in the database as its stable state value in a random walk over the schema graph, where the transition probabilities depend on the entropies of table attributes. This ensures that the importance of a table depends both on its information content, and on how that content relates to the content of other tables in the database. Second, we define a metric space over the tables in a database, such that the distance function is consistent with an intuitive notion of table similarity. Finally, we use a Weighted -Center algorithm under this distance function to cluster all tables in the database around the most relevant tables, and return the result as our summary. We conduct an extensive experimental study on a benchmark database, comparing our approach with previous methods, as well as with several hybrid models. We show that our approach not only achieves significantly higher accuracy than the previous state of the art, but is also faster and scales linearly with the size of the schema graph.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Construction of FP Tree using Huffman Coding

Generally, data mining is the process of analyzing data from different perspectives and summarizing it into useful information information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationship...

متن کامل

Relational Ethics and Nurses-Client Relationship in Nursing Practice: Literature Review

Background: This review focuses on exploring the concept of the nurse-client relationship as it may be informed by relational ethics. Relational ethics is a new approach to ethical practice in health care and can be a framework for nurses and other health professionals in considering how to help patients and families. Purpose: To review the basic elements of relational ethics prior to summarizi...

متن کامل

Relational Databases Query Optimization using Hybrid Evolutionary Algorithm

Optimizing the database queries is one of hard research problems. Exhaustive search techniques like dynamic programming is suitable for queries with a few relations, but by increasing the number of relations in query, much use of memory and processing is needed, and the use of these methods is not suitable, so we have to use random and evolutionary methods. The use of evolutionary methods, beca...

متن کامل

KDD – Knowledge Discovery in Databases

2 Database Management Systems 3 2.1 Three-Schema Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Organisation of an Integrated Database System . . . . . . . . . . . . . . . . . . . . 5 2.3 Hierarchical and Network Databases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.4 Relational Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

متن کامل

Parallel Knowledge Discovery Using Domain Generalization Graphs

Multi-Attribute Generalization is an algorithm for attribute-oriented induction in relational databases using domain generalization graphs. Each node in a domain generalization graph represents a diier-ent way of summarizing the domain values associated with an attribute. When generalizing a set of attributes, we show how a serial implementation of the algorithm generates all possible combinati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009