An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis
نویسندگان
چکیده
Balancing the accuracy rates of majority and minority classes is challenging in imbalanced classification. Furthermore, data characteristics have a significant impact on performance classifiers, which are generally neglected by existing evaluation methods. The objective this study to introduce new criterion comprehensively evaluate classifiers. Specifically, we an efficiency curve that established using envelopment analysis without explicit inputs (DEA-WEI), determine trade-off between benefits improved class cost reduced accuracy. In sequence, analyze ratio typical Empirical analyses 68 reveal traditional classifiers such as C4.5 k-nearest neighbor more effective disjunct data, whereas ensemble undersampling techniques for overlapping noisy data. cost-sensitive decreases dramatically when increases. Finally, investigate reasons different efficiencies recommend steps select appropriate based characteristics.
منابع مشابه
Class-imbalanced classifiers for high-dimensional data
A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a ...
متن کاملAnalysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics
Class imbalance is among the most persistent complications which may confront the traditional supervised learning task in real-world applications. The problem occurs, in the binary case, when the number of instances in one class significantly outnumbers the number of instances in the other class. This situation is a handicap when trying to identify the minority class, as the learning algorithms...
متن کاملEvaluating Misclassifications in Imbalanced Data
Evaluating classifier performance with ROC curves is popular in the machine learning community. To date, the only method to assess confidence of ROC curves is to construct ROC bands. In the case of severe class imbalance with few instances of the minority class, ROC bands become unreliable. We propose a generic framework for classifier evaluation to identify a segment of an ROC curve in which m...
متن کاملIntroducing a secondary goal for evaluating DMUs by cross efficiency in data envelopment analysis
One way to rank DMUs in DEA is the cross efficiency method. In this method, the efficiencyof each DMU is calculated by other DMUs optimum weights, which makes the ranking moreacceptable for managers. Existing alternative optimum weights in cross efficiency methodlead to several ranks for DMUs. Several secondary goals have introduced to avoid thisproblem, till now. In this paper, a new model is ...
متن کاملFuzzy rough classifiers for class imbalanced multi-instance data
In multi-instance learning, each learning object consists of many descriptive instances. In the corresponding classification problems, each training object is labeled, but its constituent instances are not. The classification objective is to predict the class label of unseen objects. As in traditional single-instance classification, when the class sizes of multi-instance data are imbalanced, cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Information Sciences
سال: 2022
ISSN: ['0020-0255', '1872-6291']
DOI: https://doi.org/10.1016/j.ins.2022.06.045