Selectivity Estimation Without the Attribute Value Independence Assumption

نویسندگان

  • Viswanath Poosala
  • Yannis E. Ioannidis
چکیده

The result size of a query that involves multiple attributes from the same relation depends on these attributes’joinr data distribution, i.e., the frequencies of all combinations of attribute values. To simplify the estimation of that size, most commercial systems make the artribute value independenceassumption and maintain statistics (typically histograms) on individual attributes only. In reality, this assumption is almost always wrong and the resulting estimations tend to be highly inaccurate. In this paper, we propose two main alternatives to effectively approximate (multi-dimensional) joint data distributions. (a) Using a multi-dimensional histogram, (b) Using the Singular Value Decomposition (SVD) technique from linear algebra. An extensive set of experiments demonstrates the advantages and disadvantages of the two approaches and the benefits of both compared to the independence assumption.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lightweight Graphical Models for Selectivity Estimation Without Independence Assumptions

As a result of decades of research and industrial development, modern query optimizers are complex software artifacts. However, the quality of the query plan chosen by an optimizer is largely determined by the quality of the underlying statistical summaries. Small selectivity estimation errors, propagated exponentially, can lead to severely sub-optimal plans. Modern optimizers typically maintai...

متن کامل

The effects of the violation of local independence assumption on the person measures under the Rasch model

Local independence of test items is an assumption in all Item Response Theory (IRT) models. That is, the items in a test should not be related to each other. Sharing a common passage, which is prevalent in reading comprehension tests, cloze tests and C-Tests, can be a potential source of local item dependence (LID). It is argued in the literature that LID results in biased parameter estimation ...

متن کامل

Attribute Value Weighted Average of One-Dependence Estimators

Of numerous proposals to improve the accuracy of naive Bayes by weakening its attribute independence assumption, semi-naive Bayesian classifiers which utilize one-dependence estimators (ODEs) have been shown to be able to approximate the ground-truth attribute dependencies; meanwhile, the probability estimation in ODEs is effective, thus leading to excellent performance. In previous studies, OD...

متن کامل

Selectivity Estimation for Exclusive Query Translation in Deep Web Data Integration

In Deep Web data integration, some Web database interfaces express exclusive predicates of the form Qe = Pi(Pi ∈ P1, P2, . . . , Pm), which permits only one predicate to be selected at a time. Accurately and efficiently estimating the selectivity of each Qe is of critical importance to optimal query translation. In this paper, we mainly focus on the selectivity estimation on infinite-value attr...

متن کامل

Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classi er

The simple Bayesian classi er (SBC) is commonly thought to assume that attributes are independent given the class, but this is apparently contradicted by the surprisingly good performance it exhibits in many domains that contain clear attribute dependences. No explanation for this has been proposed so far. In this paper we show that the SBC does not in fact assume attribute independence, and ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997