Unbiased estimates for linear regression via volume sampling

نویسندگان

  • Michal Derezinski
  • Manfred K. Warmuth
چکیده

Given a full rank matrix X with more columns than rows, consider the task of estimating the pseudo inverse X based on the pseudo inverse of a sampled subset of columns (of size at least the number of rows). We show that this is possible if the subset of columns is chosen proportional to the squared volume spanned by the rows of the chosen submatrix (ie, volume sampling). The resulting estimator is unbiased and surprisingly the covariance of the estimator also has a closed form: It equals a specific factor times X+>X+. Pseudo inverse plays an important part in solving the linear least squares problem, where we try to predict a label for each column of X. We assume labels are expensive and we are only given the labels for the small subset of columns we sample from X. Using our methods we show that the weight vector of the solution for the sub problem is an unbiased estimator of the optimal solution for the whole problem based on all column labels. We believe that these new formulas establish a fundamental connection between linear least squares and volume sampling. We use our methods to obtain an algorithm for volume sampling that is faster than state-of-the-art and for obtaining bounds for the total loss of the estimated least-squares solution on all labeled columns.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An incidence density sampling program for nested case-control analyses.

BACKGROUND The nested case-control design can be a very efficient approach to an epidemiological investigation. In order to obtain unbiased estimates of relative risk, controls should be selected by incidence density sampling, which involves matching each case to a sample of those who are at risk at the time of case occurrence. METHODS This paper presents a simple computer program for inciden...

متن کامل

Using Inverse Probability Bootstrap Sampling to Eliminate Sample Induced Bias in Model Based Analysis of Unequal Probability Samples

In ecology, as in other research fields, efficient sampling for population estimation often drives sample designs toward unequal probability sampling, such as in stratified sampling. Design based statistical analysis tools are appropriate for seamless integration of sample design into the statistical analysis. However, it is also common and necessary, after a sampling design has been implemente...

متن کامل

A New Unbiased and Efficient Class of LSH-Based Samplers and Estimators for Partition Function Computation in Log-Linear Models

Log-linear models are arguably the most successful class of graphical models for large-scale applications because of their simplicity and tractability. Learning and inference with these models require calculating the partition function, which is a major bottleneck and intractable for large state spaces. Importance Sampling (IS) and MCMC-based approaches are lucrative. However, the condition of ...

متن کامل

Estimating Hunting Success Rates via Bayesian Generalized Linear Models

Post-season harvest surveys provide data used in the management of Missouri wildlife. These surveys provide information on the number of animals harvested, hunting pressure and hunter success rate. These estimates provide unbiased results at the statewide level due to the large sample size. However, if this survey information is used to make county estimates, poor results often occur due to sma...

متن کامل

Liu Estimates and Influence Analysis in Regression Models with Stochastic Linear Restrictions and AR (1) Errors

In the linear regression models with AR (1) error structure when collinearity exists, stochastic linear restrictions or modifications of biased estimators (including Liu estimators) can be used to reduce the estimated variance of the regression coefficients estimates. In this paper, the combination of the biased Liu estimator and stochastic linear restrictions estimator is considered to overcom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017