On Masking and Swamping Robustness of Leading Outlier Identifiers for Univariate Data
نویسندگان
چکیده
In the wide-ranging scope of modern statistical data analysis, a key task is identification of outliers. In using an outlier identification procedure, one needs to know its robustness against masking (an “outlier” is undetected) and swamping (a “nonoutlier” is classified as an “outlier”), possibilities which can come about due to the presence of outliers. Study of these issues together is necessary but complex. Recently, Serfling and Wang (2012) developed a general framework providing foundations, tools, and criteria applicable in any data space. Application of this framework to particular outlier identifiers in particular types of data space requires, however, additional development of a nature specialized to the chosen setting. The present paper applies the general framework to the case of univariate data and evaluates masking and swamping robustness for two leading outlier identifiers, scaled deviation outlyingness and centered rank outlyingness. Our results shed new light on the choice between (Median, MAD) and (trimmed mean, trimmed standard deviation) in defining scaled deviation outlyingness. Also, our findings explain how the boxplot, a leading descriptive tool, acquires its excellent robustness by incorporating a scaled deviation outlier identification component alongside its quantile-based description of the central part of a data set. AMS 2000 Subject Classification: Primary 62G35 Secondary 62-07
منابع مشابه
General Foundations for Studying Masking and Swamping Robustness of Outlier Identifiers
With greatly advanced computational resources, the scope of statistical data analysis and modeling has widened to accommodate pressing new arenas of application. In all such data settings, an important and challenging task is the identification of outliers. Especially, an outlier identification procedure must be robust against the possibilities of masking (an outlier is undetected as such) and ...
متن کاملNonparametric Depth-Based Multivariate Outlier Identifiers, and Robustness Properties
In extending univariate outlier detection methods to higher dimension, various special issues arise, such as limitations of visualization methods, inadequacy of marginal methods, lack of a natural order, limited scope of parametric modeling, and restriction to ellipsoidal contours when using Mahalanobis distance methods. Here we pass beyond these limitations via an approach based on depth funct...
متن کاملNonparametric Depth-Based Multivariate Outlier Identifiers, and Masking Robustness Properties
In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on m...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملThe masking breakdown point of multivariate outlier identification rules
In this paper we consider one step outlier identi cation rules for multivariate data generalizing the concept of so called outlier identi ers as presented in Davies and Gather for the case of univariate samples We investigate how the nite sample breakdown points of estimators used in these identi cation rules in uence the masking behaviour of the rules
متن کامل