Local case-control sampling: Efficient subsampling in imbalanced data sets
نویسندگان
چکیده
منابع مشابه
Subsampling and reconstruction of bandlimited images with universal sampling sets
We investigate the subsampling and reconstruction of bandlimited images at universal sampling sets. Theoretically such sampling sets should guarantee the reconstruction of an image that is k-sparse in the DFT domain with only k samples. We find that, due to matrix conditioning issues, more than k samples are generally required, and we compare the reconstruction results to those from the sparse ...
متن کاملA Case Study for Learning from Imbalanced Data Sets
We present our experience in applying a rule induction technique to an extremely imbalanced pharmaceutical data set. We focus on using a variety of performance measures to evaluate a number of rule quality measures. We also investigate whether simply changing the distribution skew in the training data can improve predictive performance. Finally, we propose a method for adjusting the learning al...
متن کاملNeighbourhood sampling in bagging for imbalanced data
Various approaches to extend bagging ensembles for class imbalanced data are considered. First, we review known extensions and compare them in a comprehensive experimental study. The results show that integrating bagging with under-sampling is more powerful than over-sampling. They also allow to distinguish Roughly Balanced Bagging as the most accurate extension. Then, we point out that complex...
متن کاملTitle: A PRIORI SYNTHETIC SAMPLING FOR INCREASING CLASSIFICATION SENSITIVITY IN IMBALANCED DATA SETS
Class imbalance data usually suffers from data intrinsic properties beyond that of imbalance alone. The problem is intensified with larger levels of imbalance most commonly found in observational studies. Extreme cases of class imbalance are commonly found in many domains including fraud detection, mammography of cancer and post term births. These rare events are usually the most costly or have...
متن کاملBorderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning
In recent years, mining with imbalanced data sets receives more and more attentions in both theoretical and practical aspects. This paper introduces the importance of imbalanced data sets and their broad application domains in data mining, and then summarizes the evaluation metrics and the existing methods to evaluate and solve the imbalance problem. Synthetic minority oversampling technique (S...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Annals of Statistics
سال: 2014
ISSN: 0090-5364
DOI: 10.1214/14-aos1220