Outlier Detection by Boosting Regression Trees

Authors

Jean-Michel Poggi

Nathalie Chèze

Abstract:

A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of the average number of appearances in bootstrap samples. So the procedure is noise distribution free. It allows to select outliers as particularly hard to predict observations. A lot of well-known bench data sets are considered and a comparative study against two well-known competitors allows to show the value of the method.

Download for Free

Already have an account?login

similar resources

Boosting and instability for regression trees

The AdaBoost like algorithm for boosting CART regression trees is considered. The boosting predictors sequence is analyzed on various data sets and the behaviour of the algorithm is investigated. An instability index of a given estimation method with respect to some training sample is defined. Based on the bagging algorithm, this instability index is then extended to quantify the additional ins...

full text

Granular Box Regression Methods for Outlier Detection

Granular computing (GrC) is an emerging computing paradigm of information processing. It concerns the processing of complex information entities called information granules, which arise in the process of data abstraction and derivation of knowledge from information. Granular computing is more a theoretical perspective, it encourages an approach to data that recognizes and exploits the knowledge...

full text

Multiple Linear Regression Models in Outlier Detection

Identifying anomalous values in the realworld database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outlie...

full text

Outlier Detection Using Nonconvex Penalized Regression

This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the n data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual L1 penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The L1 penalty corresponds to ...

full text

Outlier Detection Methods in Multivariate Regression Models

Outlier detection statistics based on two models, the case-deletion model and the mean-shift model, are developed in the context of a multivariate linear regression model. These are generalizations of the univariate Cook’s distance and other diagnostic statistics. Approximate distributions of the proposed statistics are also obtained to get suitable cutoff points for significance tests. In addi...

full text

Outlier Detection using Granular Box Regression Methods

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}

Journal title

Journal of Statistical Research of Iran

volume 3 issue 1

pages 1- 22

publication date 2006-09

unfollow

{@ msg @}

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

Hosted on Doprax cloud platform doprax.com