Unbiased Recursive Partitioning: A Conditional Inference Framework
نویسندگان
چکیده
Recursive binary partitioning is a popular tool for regression analysis. Two fundamental problems of exhaustive search procedures usually applied to fit such models have been known for a long time: Overfitting and a selection bias towards covariates with many possible splits or missing values. While pruning procedures are able to solve the overfitting problem, the variable selection bias still seriously effects the interpretability of tree-structured regression models. For some special cases unbiased procedures have been suggested, however lacking a common theoretical foundation. We propose a unified framework for recursive partitioning which embeds tree-structured regression models into a well defined theory of conditional inference procedures. Stopping criteria based on multiple test procedures are implemented and it is shown that the predictive performance of the resulting trees is as good as the performance of established exhaustive search procedures. It turns out that the partitions and therefore the models induced by both approaches are structurally different, indicating the need for an unbiased variable selection. The methodology presented here is applicable to all kinds of regression problems, including nominal, ordinal, numeric, censored as well as multivariate response variables and arbitrary measurement scales of the covariates. Data from studies on animal abundance, glaucoma classification, node positive breast cancer and mammography experience are re-analyzed.
منابع مشابه
Recursive partitioning and Bayesian inference on conditional distributions
In this work we introduce a Bayesian framework for nonparametric inference on conditional distributions in the form of a prior called the conditional optional Pólya tree. The prior is constructed based on a two-stage nested procedure, which in the first stage recursively partitions the predictor space, and then in the second generates the conditional distribution on those predictor blocks using...
متن کاملTargeted maximum likelihood estimation for prediction calibration.
Estimators of the conditional expectation, i.e., prediction, function involve a global bias-variance trade off. In some cases, an estimator that yields unbiased estimates of the conditional expectation for a particular partitioning of the data may be desirable. Such estimators are calibrated with respect to the partitioning. We identify the conditional expectation given a particular partitionin...
متن کاملParty on ! A New
Random forests are one of the most popular statistical learning algorithms, and a variety of methods for fitting random forests and related recursive partitioning approaches is available in R. This paper points out two important features of the random forest implementation cforest available in the party package: The resulting forests are unbiased and thus preferable to the randomForest implemen...
متن کاملRecursive partitioning and multi-scale modeling on conditional densities
Abstract: We introduce a nonparametric prior on the conditional distribution of a (univariate or multivariate) response given a set of predictors. The prior is constructed in the form of a two-stage generative procedure, which in the first stage recursively partitions the predictor space, and then in the second stage generates the conditional distribution by a multi-scale nonparametric density ...
متن کاملPredicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques
Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...
متن کامل