Detection of single influential points in OLS regression model building
نویسندگان
چکیده
Identifying outliers and high-leverage points is a fundamental step in the least-squares regression model building process. Various influence measures based on different motivational arguments, and designed to measure the influence of observations on different aspects of various regression results, are elucidated and critiqued here. On the basis of a statistical analysis of the residuals (classical, normalized, standardized, jackknife, predicted and recursive) and diagonal elements of a projection matrix, diagnostic plots for influential points indication are formed. Regression diagnostics do not require a knowledge of an alternative hypothesis for testing, or the fulfillment of the other assumptions of classical statistical tests. In the interactive, PC-assisted diagnosis of data, models and estimation methods, the examination of data quality involves the detection of influential points, outliers and high-leverages, which cause many problems in regression analysis. This paper provides a basic survey of the influence statistics of single cases combining exploratory analysis of all variables. The graphical aids to the identification of outliers and high-leverage points are combined with graphs for the identification of influence type based on the likelihood distance. All these graphically oriented techniques are suitable for the rapid estimation of influential points, but are generally incapable of solving problems with masking and swamping. The powerful procedure for the computation of influential points characteristics has been written in Matlab 5.3 and is available from authors. © 2001 Elsevier Science B.V. All rights reserved.
منابع مشابه
Multiple Outliers Detection: Application to Research & Development Spending and Productivity Growth
Multiple outliers are frequently encountered in applied studies in business and economics. Most of the practitioners depend on ordinary least squares (OLS) method for parameter estimation in regression analysis without identifying outliers properly. It is evident that OLS totally fails even in presence of single outlying observation. Single observation outlier detection methods are failed to id...
متن کاملThe effect of influential data, model and method on the precision of univariate calibration.
Building a calibration model with detection and quantification capabilities is identical to the task of building a regression model. Although commonly used by analysts, an application of the calibration model requires at first careful attention to the three components of the regression triplet (data, model, method), examining (a) the data quality of the proposed model; (b) the model quality; (c...
متن کاملComparison of the Performance of Geographically Weighted Regression and Ordinary Least Squares for modeling of Sea surface temperature in Oman Sea
In Marine discussions, the study of sea surface temperature (SST) and study of its spatial relationships with other ocean parameters are of particular importance, in such a way that the accurate recognition of the SST relationships with other parameters allows the study of many ocean and atmospheric processes. Therefore, in this study, spatial relations modeling of SST with Surface Wind Speed (...
متن کاملLAD Regression and Nonparametric Methods for Detecting Outliers and Leverage Points
The detection of influential observations for the standard least squares regression model is a question that has been extensively studied. LAD regression diagnostics offers alternative approaches whose main feature is the robustness. In this paper a new approach for nonparametric detection of influencial observations in LAD regression models is presented and compared with other classical method...
متن کاملCook’s distance for ridge estimator in semiparametric regression
The detection of influential observations has attracted a great deal of attention in last few decades. Most of the ideas of determining influential observations are based on single-case diagnostics with ith case deleted. The Cook’s distance are most commonly used among the other single-case diagnostics and successfully applied to various statistical models. In this article, we propose Cook’s di...
متن کامل