Outlier Detection by Boosting Regression Trees

نویسندگان

  • Nathalie Chèze
چکیده مقاله:

A procedure for detecting outliers in regression problems is proposed. It is based on information provided by boosting regression trees. The key idea is to select the most frequently resampled observation along the boosting iterations and reiterate after removing it. The selection criterion is based on Tchebychev’s inequality applied to the maximum over the boosting iterations of the average number of appearances in bootstrap samples. So the procedure is noise distribution free. It allows to select outliers as particularly hard to predict observations. A lot of well-known bench data sets are considered and a comparative study against two well-known competitors allows to show the value of the method.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Boosting and instability for regression trees

The AdaBoost like algorithm for boosting CART regression trees is considered. The boosting predictors sequence is analyzed on various data sets and the behaviour of the algorithm is investigated. An instability index of a given estimation method with respect to some training sample is defined. Based on the bagging algorithm, this instability index is then extended to quantify the additional ins...

متن کامل

Granular Box Regression Methods for Outlier Detection

Granular computing (GrC) is an emerging computing paradigm of information processing. It concerns the processing of complex information entities called information granules, which arise in the process of data abstraction and derivation of knowledge from information. Granular computing is more a theoretical perspective, it encourages an approach to data that recognizes and exploits the knowledge...

متن کامل

Multiple Linear Regression Models in Outlier Detection

Identifying anomalous values in the realworld database is important both for improving the quality of original data and for reducing the impact of anomalous values in the process of knowledge discovery in databases. Such anomalous values give useful information to the data analyst in discovering useful patterns. Through isolation, these data may be separated and analyzed. The analysis of outlie...

متن کامل

Outlier Detection Using Nonconvex Penalized Regression

This paper studies the outlier detection problem from the point of view of penalized regressions. Our regression model adds one mean shift parameter for each of the n data points. We then apply a regularization favoring a sparse vector of mean shift parameters. The usual L1 penalty yields a convex criterion, but we find that it fails to deliver a robust estimator. The L1 penalty corresponds to ...

متن کامل

Outlier Detection Methods in Multivariate Regression Models

Outlier detection statistics based on two models, the case-deletion model and the mean-shift model, are developed in the context of a multivariate linear regression model. These are generalizations of the univariate Cook’s distance and other diagnostic statistics. Approximate distributions of the proposed statistics are also obtained to get suitable cutoff points for significance tests. In addi...

متن کامل

Outlier Detection using Granular Box Regression Methods

Granular computing (GrC) is an emerging computing paradigm of information processing. It concerns the processing of complex information entities called information granules, which arise in the process of data abstraction and derivation of knowledge from information. Granular computing is more a theoretical perspective, it encourages an approach to data that recognizes and exploits the knowledge...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 3  شماره 1

صفحات  1- 22

تاریخ انتشار 2006-09

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023