Efficient Sampling: Application to Image Data
نویسندگان
چکیده
Sampling is an important preprocessing algorithm that is used to mine large data efficiently. Although a simple random sample often works fine for reasonable sample size, accuracy falls sharply with reduced sample size. In kdd’03 we proposed ease that outputs a sample based on its ‘closeness’ to the original sample. Reported results show that ease outperforms simple random sampling (srs). In this paper we propose easier that extends ease in two ways. 1) ease is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. easier, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. 2) ease was shown to work on ibm quest dataset which is a categorical count dataset. easier, in addition, is shown to work on continuous data such as Color Structure Descriptor of images. Two mining tasks, classification and association rule mining, are used to validate the efficacy of easier samples vis-a-vis ease and srs samples.
منابع مشابه
Irregular sampling for multidimensional polar processing of integral transforms
We survey a family of theories that enable to process polar data via integral transforms. We show the relation between irregular sampling and discrete integral transforms, demonstrate the application of irregular (polar) sampling to image processing problems, and derive approximation algorithms that are based on unequally spaced samples. It is based on sampling the Fourier domain. We describe 2...
متن کاملImplementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey
Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...
متن کاملMEDICAL IMAGE COMPRESSION: A REVIEW
Within recent years the use of medical images for diagnosis purposes has become necessity. The limitation in transmission and storage space also growing size of medical images has necessitated the need for efficient method, then image Compression is required as an efficient way to reduces irrelevant and redundancy of the image data in order to be able to store or transmits data. It also reduces...
متن کاملApplication of remote sensing and geographical information system in mapping land cover of the national park
The study was conducted with the objective of mapping landscape cover of Nechsar National park in Ethiopia to produce spatially accurate and timely information on land use and changing pattern. Monitoring provides the planners and decision-makers with required information about the current state of its development and the nature of changes that have occurred. Remote sensing and Geographical Inf...
متن کاملIrregular Sampling for Multidimensional Polar Processing of Integral Transforms and Prolate Spheroidal Wave Functions
Analyzing physical phenomena using a computer inevitably requires to bridge between the continuous nature of the physical phenomena and the discrete nature of computers. This bridging is known as sampling. The continuous signal is sampled such that it is represented by a discrete set of samples. The sampling scheme used to discretize the signal is often irregular. This may be due to either the ...
متن کاملAbstract—Extracting information from a training data set for predictive inference is a fundamental task in data mining
Extracting information from a training data set for predictive inference is a fundamental task in data mining and machine learning. With the exponential growth in the amount of data being generated in the past few years, there is an urgent need to develop or adapt existing learning algorithms to efficiently learn from large data sets. This paper describes three scaling techniques enabling machi...
متن کامل