Statistical Tests for Network Classifier Evaluation
نویسندگان
چکیده
Recently a number of modeling techniques have been developed for data mining and machine learning in relational and network domains where the instances are not independent and identically distributed (i.i.d.). These methods specifically exploit the statistical dependencies among instances in order to improve classification accuracy. However, there has been little focus on how these same dependencies affect our ability to draw accurate conclusions about the performance of the models. More specifically, the complex link structure and attribute dependencies in network data violate the assumptions of many conventional statistical tests and make it difficult to use these tests to assess the models in an unbiased manner. In this work, we examine the task of within-network classification and the question of whether two algorithms will learn models which will result in significantly different levels of performance. We show that the commonly-used form of evaluation (paired t-test on overlapping network samples) can result in an unacceptable level of Type I error. Furthermore we show that Type I error increases as (1) the correlation among instances increases and (2) the size of the evaluation set increases (i.e., the proportion of labeled nodes in the network decreases). We propose a method for network cross-validation that combined with paired t-tests produces more acceptable levels of Type I error while still providing reasonable levels of statistical power (i.e., Type II error).
منابع مشابه
Generalization Capability of Homogeneous Voting Classifier Based on Partially Replicated Data
The generalization error is one of the most important features taken into account in performance evaluation and verification of any classifier. We propose a voting system based on homogenous base classifiers (HVC) which ensures a better generalization capability than any of its components. The principle of this idea consists in the differentiation of a learning data set for each base classifier...
متن کاملThe use of measure functions for evaluating classifier
Evaluation of classifier performance is often based on statistical methods e.g. cross-validation tests. In these tests performance is often strongly related to or solely based on the accuracy of the classifier on a limited set of instances. The use of measure functions has been suggested as a promising approach to deal with this limitation. However, no usable implementation of a measure functi...
متن کاملCorrecting Bias in Statistical Tests for Network Classifier Evaluation
Abstract. It is di cult to directly apply conventional significance tests to compare the performance of network classification models because network data instances are not independent and identically distributed. Recent work [6] has shown that paired t-tests applied to overlapping network samples will result in unacceptably high levels (e.g., up to 50%) of Type I error (i.e., the tests lead to...
متن کاملA DWT and SVM based method for rolling element bearing fault diagnosis and its comparison with Artificial Neural Networks
A classification technique using Support Vector Machine (SVM) classifier for detection of rolling element bearing fault is presented here. The SVM was fed from features that were extracted from of vibration signals obtained from experimental setup consisting of rotating driveline that was mounted on rolling element bearings which were run in normal and with artificially faults induced conditio...
متن کاملSUBCLASS FUZZY-SVM CLASSIFIER AS AN EFFICIENT METHOD TO ENHANCE THE MASS DETECTION IN MAMMOGRAMS
This paper is concerned with the development of a novel classifier for automatic mass detection of mammograms, based on contourlet feature extraction in conjunction with statistical and fuzzy classifiers. In this method, mammograms are segmented into regions of interest (ROI) in order to extract features including geometrical and contourlet coefficients. The extracted features benefit from...
متن کامل