Overlap pattern synthesis with an efficient nearest neighbor classifier
نویسندگان
چکیده
Nearest neighbor (NN) classifier is the most popular non-parametric classifier. It is a simple classifier with no design phase and shows good performance. Important factors affecting the efficiency and performance of NN classifier are (i) memory required to store the training set, (ii) classification time required to search the nearest neighbor of a given test pattern, and (iii) due to the curse of dimensionality the number of training patterns needed by it to achieve a given classification accuracy becomes prohibitively large when the dimensionality of the data is high. In this paper, we propose novel techniques to improve the performance of NN classifier and at the same time to reduce its computational burden. These techniques are broadly based on: (i) overlap based pattern synthesis which can generate a larger number of artificial patterns than the number of input patterns and thus can reduce the curse of dimensionality effect, (ii) a compact representation of the given set of training patterns called overlap pattern graph (OLP-graph) which can be incrementally built by scanning the training set only once and (iii) an efficient NN classifier called OLP-NNC which directly works with OLP-graph and does implicit overlap based pattern synthesis. A comparison based on experimental results is given between some of the relevant classifiers. The proposed schemes are suitable for applications dealing with large and high dimensional datasets like those in data mining.
منابع مشابه
Fusion of multiple approximate nearest neighbor classifiers for fast and efficient classification
The nearest neighbor classifier (NNC) is a popular non-parametric classifier. It is a simple classifier with no design phase and shows good performance. Important factors affecting the efficiency and performance of NNC are (i) memory required to store the training set, (ii) classification time required to search the nearest neighbor of a given test pattern, and (iii) due to the curse of dimensi...
متن کاملPartition based pattern synthesis technique with efficient algorithms for nearest neighbor classification
Nearest neighbor (NN) classifier is the most popular non-parametric classifier. It is a simple classifier with no design phase and shows good performance. Due to the curse of dimensionality effect, the size of training set needed by it to achieve a given classification accuracy becomes prohibitively large when the dimensionality of the data is high. Generating artificial patterns can reduce thi...
متن کاملK-D Decision Tree: An Accelerated and Memory Efficient Nearest Neighbor Classifier
This paper presents a novel Nearest Neighbor (NN) classifier. NN classification is a well studied method for pattern classification having the following properties; * it performs maximum-margin classification and achieves less than the twice of ideal Bayesian error, * it does not require the knowledge on pattern distributions, kernel functions or base classifiers, and * it can naturally be appl...
متن کاملCenter-based nearest neighbor classifier
In this paper, a novel center-based nearest neighbor (CNN) classifier is proposed to deal with the pattern classification problems. Unlike nearest feature line (NFL) method, CNN considers the line passing through a sample point with known label and the center of the sample class. This line is called the center-based line (CL). These lines seem to have more capacity of representation for sample ...
متن کاملNon-zero probability of nearest neighbor searching
Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Pattern Recognition
دوره 38 شماره
صفحات -
تاریخ انتشار 2005