TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies

نویسندگان

  • Ykä Huhtala
  • Juha Kärkkäinen
  • Pasi Porkka
  • Hannu Toivonen
چکیده

The discovery of functional dependencies from relations is an important database analysis technique. We present TANE, an efficient algorithm for finding functional dependencies from large databases. TANE is based on partitioning the set of rows with respect to their attribute values, which makes testing the validity of functional dependencies fast even for a large number of tuples. The use of partitions also makes the discovery of approximate functional dependencies easy and efficient and the erroneous or exceptional rows can be identified easily. Experiments show that TANE is fast in practice. For benchmark databases the running times are improved by several orders of magnitude over previously published results. The algorithm is also applicable to much larger datasets than the previous methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Discovery of Functional Dependencies and Armstrong Relations

In this paper, we propose a new efficient algorithm called Dep-Miner for discovering minimal non-trivial functional dependencies from large databases. Based on theoretical foundations, our approach combines the discovery of functional dependencies along with the construction of real-world Armstrong relations (without additional execution time). These relations are small Armstrong relations taki...

متن کامل

Mining Approximate Functional Dependencies from Databases Based on Minimal Cover and Equivalent Classes

Data Mining (DM) represents the process of extracting interesting and previously unknown knowledge from data. Approximate Functional Dependencies (AFD) mined from database relations represent potentially interesting patterns and have proven to be useful for various tasks like feature selection for classification, query optimization and query rewriting. The discovery of AFDs still remains under ...

متن کامل

Efficient Discovery of Functional and Approximate Dependencies Using Partitions

Discovery of functional dependencies from relations has been identified as an important database analysis technique. In this paper, we present a new approach for finding functional dependencies from large databases, based on partitioning the set of rows with respect to their attribute values. The use of partitions makes the discovery of approximate functional dependencies easy and efficient, an...

متن کامل

Efficient discovery of similarity constraints for matching dependencies

Article history: Received 15 December 2011 Received in revised form 12 June 2013 Accepted 12 June 2013 Available online 29 June 2013 The concept of matching dependencies (MDs) has recently been proposed for specifying matching rules for object identification. Similar to the functional dependencies (with conditions), MDs can also be applied to various data quality applications such as detecting ...

متن کامل

Discovering Denial Constraints

Integrity constraints (ICs) provide a valuable tool for enforcing correct application semantics. However, designing ICs requires experts and time. Proposals for automatic discovery have been made for some formalisms, such as functional dependencies and their extension conditional functional dependencies. Unfortunately, these dependencies cannot express many common business rules. For example, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Comput. J.

دوره 42  شماره 

صفحات  -

تاریخ انتشار 1999