Mahalanobis Encodings for Visual Categorization
نویسندگان
چکیده
Nowadays, the design of the representation of images is one of the most crucial factors in the performance of visual categorization. A common pipeline employed in most of recent researches for obtaining an image representation consists of two steps: the encoding step and the pooling step. In this paper, we introduce the Mahalanobis metric to the two popular image patch encoding modules, Histogram Encoding and Fisher Encoding, that are used for Bagof-Visual-Word method and Fisher Vector method, respectively. Moreover, for the proposed Fisher Vector method, a close-form approximation of Fisher Vector can be derived with the same assumption used in the original Fisher Vector, and the codebook is built without resorting to time-consuming EM (Expectation-Maximization) steps. Experimental evaluation of multi-class classification demonstrates the effectiveness of the proposed encoding methods.
منابع مشابه
Image Retrieval and Classification Using Local Distance Functions
(x − x)A(x − x) Mahalanobis distance: Previous work on learning metrics has focused on learning a single distance metric for all instances. One of our primary contributions is to learn a distance function for every training image. Most visual categorization approaches make use of machine learning after computing distances between images (e.g. SVM with pyramid kernel). We want to learn how to co...
متن کاملCoarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli.
Efficient categorizations of complex visual stimuli require effective encodings of their distinctive properties. However, the question remains of how processes of object and scene categorization use the information associated with different perceptual spatial scales. The psychophysics of scale perception suggests that recognition uses coarse blobs before fine scale edges, because the former is ...
متن کاملVisualization of Movement in Multiscale
This paper will explore a number of visual encodings that can be layered over (x, y, scale, rotation) movement through multiscale. The visual encodings are designed to enhance viewers’ understanding of movement in between key frames. Since none of the visual encodings in this paper conveys all aspects of movement, to give the viewer a complete sense of movement, multiple visual encodings must b...
متن کاملFace retrieval by an adaptive Mahalanobis distance using a confidence factor
This paper proposes an adaptive Mahalanobis distance for face retrieval. The distance is derived from a posterior distribution of observation errors in features categorized by con dence of face images. Since the distance is calculated considering error variances of each dimension according to the con dence, it can re ect error distribution of each matching more precisely than a standard Mahalan...
متن کاملVisual Analytics Using Density Equalizing Geographic Distortion
Visualizing large geo-demographical data sets using pixel-based techniques involves mapping the geo-spatial dimensions of a data point to screen coordinates and appropriately encoding its statistical value by color. Analysis of such data is a great challenge. General tasks involve clustering, categorization and searching for patterns of interest for sociological or economic research. Available ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IPSJ Trans. Computer Vision and Applications
دوره 7 شماره
صفحات -
تاریخ انتشار 2015