Hit Miss Networks with Applications to Instance Selection
نویسنده
چکیده
In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation over pairs of instances from each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1-NN classifier by removing instances from the training set (instance selection). We introduce three new HMN-based algorithms for instance selection. HMN-C, which removes instances without affecting accuracy of 1-NN on the original training set, HMN-E, based on a more aggressive storage reduction, and HMN-EI, which applies iteratively HMN-E. Their performance is assessed on 22 data sets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise content, and presence of redundant variables. Results of experiments on these data sets show that accuracy of 1-NN classifier increases significantly when HMN-EI is applied. Comparison with state-of-the-art editing algorithms for instance selection on these data sets indicates best generalization performance of HMN-EI and no significant difference in storage requirements. In general, these results indicate that HMN’s provide a powerful graph-based representation of a training set, which can be successfully applied for performing noise and redundance reduction in instance-based learning.
منابع مشابه
Automatic design of morphological operators
The key to successful morphological image processing is the selection of structuring elements. There are a myriad of algorithms for a multitude of imaging applications, but in each and every instance, algorithm performance depends on the structuring elements. The classical approach to morphological processing is to have a human being, or a group of human beings, use intuition and an understandi...
متن کاملGeneralized Hit - Miss Operators with Applications to Document Image Analysis
The morphological operators of a hit-miss transformation, opening, and closing are generalized in a number of ways. The new operators are useful for solving a variety of binary image analysis problems that involve pattern detection and reconstruction. Generalized openings are developed by replacing erosions with hit-miss operators. These new openings are shown to be anti-extensive, idempotent, ...
متن کاملGrey-level hit-or-miss transforms - part II: Application to angiographic image processing
The hit-or-miss transform (HMT) is a fundamental operation on binary images, widely used since 40 years. As it is not increasing, its extension to grey-level images is not straightforward, and very few authors have considered it. Moreover, despite its potential usefulness, very few applications of the grey-level HMT have been proposed until now. Part I of this paper [B. Naegel, N. Passat, C. Ro...
متن کاملIFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF
Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...
متن کاملHashed Addressed Caches for Embedded Pointer Based Codes (Research Note)
We are proposing a cache addressing scheme based on hash-ing intended to decrease the miss ratio of small size caches. The main intention is to improve the hit ratio for 'random' patterns pointer memory accesses for embedded (special purpose) system applications. We introduce a hashing scheme, denoted as bit juggling, and measure the eeect such a scheme has in the cache access miss ratio. It is...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 9 شماره
صفحات -
تاریخ انتشار 2008