Efficient Sampling: Application to Image Data

نویسندگان

  • Surong Wang
  • Manoranjan Dash
  • Liang-Tien Chia
چکیده

Sampling is an important preprocessing algorithm that is used to mine large data efficiently. Although a simple random sample often works fine for reasonable sample size, accuracy falls sharply with reduced sample size. In kdd’03 we proposed ease that outputs a sample based on its ‘closeness’ to the original sample. Reported results show that ease outperforms simple random sampling (srs). In this paper we propose easier that extends ease in two ways. 1) ease is a halving algorithm, i.e., to achieve the required sample ratio it starts from a suitable initial large sample and iteratively halves. easier, on the other hand, does away with the repeated halving by directly obtaining the required sample ratio in one iteration. 2) ease was shown to work on ibm quest dataset which is a categorical count dataset. easier, in addition, is shown to work on continuous data such as Color Structure Descriptor of images. Two mining tasks, classification and association rule mining, are used to validate the efficacy of easier samples vis-a-vis ease and srs samples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Irregular sampling for multidimensional polar processing of integral transforms

We survey a family of theories that enable to process polar data via integral transforms. We show the relation between irregular sampling and discrete integral transforms, demonstrate the application of irregular (polar) sampling to image processing problems, and derive approximation algorithms that are based on unequally spaced samples. It is based on sampling the Fourier domain. We describe 2...

متن کامل

Implementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey

Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...

متن کامل

MEDICAL IMAGE COMPRESSION: A REVIEW

Within recent years the use of medical images for diagnosis purposes has become necessity. The limitation in transmission and storage space also growing size of medical images has necessitated the need for efficient method, then image Compression is required as an efficient way to reduces irrelevant and redundancy of the image data in order to be able to store or transmits data. It also reduces...

متن کامل

Application of remote sensing and geographical information system in mapping land cover of the national park

The study was conducted with the objective of mapping landscape cover of Nechsar National park in Ethiopia to produce spatially accurate and timely information on land use and changing pattern. Monitoring provides the planners and decision-makers with required information about the current state of its development and the nature of changes that have occurred. Remote sensing and Geographical Inf...

متن کامل

Irregular Sampling for Multidimensional Polar Processing of Integral Transforms and Prolate Spheroidal Wave Functions

Analyzing physical phenomena using a computer inevitably requires to bridge between the continuous nature of the physical phenomena and the discrete nature of computers. This bridging is known as sampling. The continuous signal is sampled such that it is represented by a discrete set of samples. The sampling scheme used to discretize the signal is often irregular. This may be due to either the ...

متن کامل

Abstract—Extracting information from a training data set for predictive inference is a fundamental task in data mining

Extracting information from a training data set for predictive inference is a fundamental task in data mining and machine learning. With the exponential growth in the amount of data being generated in the past few years, there is an urgent need to develop or adapt existing learning algorithms to efficiently learn from large data sets. This paper describes three scaling techniques enabling machi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005