An efficiency curve for evaluating imbalanced classifiers considering intrinsic data characteristics: Experimental analysis

نویسندگان

چکیده

Balancing the accuracy rates of majority and minority classes is challenging in imbalanced classification. Furthermore, data characteristics have a significant impact on performance classifiers, which are generally neglected by existing evaluation methods. The objective this study to introduce new criterion comprehensively evaluate classifiers. Specifically, we an efficiency curve that established using envelopment analysis without explicit inputs (DEA-WEI), determine trade-off between benefits improved class cost reduced accuracy. In sequence, analyze ratio typical Empirical analyses 68 reveal traditional classifiers such as C4.5 k-nearest neighbor more effective disjunct data, whereas ensemble undersampling techniques for overlapping noisy data. cost-sensitive decreases dramatically when increases. Finally, investigate reasons different efficiencies recommend steps select appropriate based characteristics.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Class-imbalanced classifiers for high-dimensional data

A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a ...

متن کامل

Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics

Class imbalance is among the most persistent complications which may confront the traditional supervised learning task in real-world applications. The problem occurs, in the binary case, when the number of instances in one class significantly outnumbers the number of instances in the other class. This situation is a handicap when trying to identify the minority class, as the learning algorithms...

متن کامل

Evaluating Misclassifications in Imbalanced Data

Evaluating classifier performance with ROC curves is popular in the machine learning community. To date, the only method to assess confidence of ROC curves is to construct ROC bands. In the case of severe class imbalance with few instances of the minority class, ROC bands become unreliable. We propose a generic framework for classifier evaluation to identify a segment of an ROC curve in which m...

متن کامل

Introducing a secondary goal for evaluating DMUs by cross efficiency in data envelopment analysis

One way to rank DMUs in DEA is the cross efficiency method. In this method, the efficiencyof each DMU is calculated by other DMUs optimum weights, which makes the ranking moreacceptable for managers. Existing alternative optimum weights in cross efficiency methodlead to several ranks for DMUs. Several secondary goals have introduced to avoid thisproblem, till now. In this paper, a new model is ...

متن کامل

Fuzzy rough classifiers for class imbalanced multi-instance data

In multi-instance learning, each learning object consists of many descriptive instances. In the corresponding classification problems, each training object is labeled, but its constituent instances are not. The classification objective is to predict the class label of unseen objects. As in traditional single-instance classification, when the class sizes of multi-instance data are imbalanced, cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Sciences

سال: 2022

ISSN: ['0020-0255', '1872-6291']

DOI: https://doi.org/10.1016/j.ins.2022.06.045