Learning Multi-modal Dictionaries: Application to Audiovisual Data
نویسندگان
چکیده
This paper presents a methodology for extracting meaningful synchronous structures from multi-modal signals. Simultaneous processing of multi-modal data can reveal information that is unavailable when handling the sources separately. However, in natural high-dimensional data, the statistical dependencies between modalities are, most of the time, not obvious. Learning fundamental multi-modal patterns is an alternative to classical statistical methods. Typically, recurrent patterns are shift invariant, thus the learning should try to find the best matching filters. We present a new algorithm for iteratively learning multimodal generating functions that can be shifted at all positions in the signal. The proposed algorithm is applied to audiovisual sequences and it demonstrates to be able to discover underlying structures in the data.
منابع مشابه
Detection of Synchronous Audiovisual Events
This paper presents an algorithm to correlate audio and visual data generated by the same physical phenomenon. According to psychophysical experiments, temporal synchrony strongly contributes to integrate cross-modal information in humans. Thus, we define meaningful audiovisual structures as temporally proximal audio-video events. Audio and video signals are represented as sparse decompositions...
متن کاملUncorrelated Multi-View Discrimination Dictionary Learning for Recognition
Dictionary learning (DL) has now become an important feature learning technique that owns state-of-the-art recognition performance. Due to sparse characteristic of data in real-world applications, DL uses a set of learned dictionary bases to represent the linear decomposition of a data point. Fisher discrimination DL (FDDL) is a representative supervised DL method, which constructs a structured...
متن کاملAudiovisual quality integration for interactive communications
This paper investigates multi-modal aspects of audiovisual quality assessment for interactive communication services. It shows how perceived auditory and visual qualities integrate to an overall audiovisual quality perception in different experimental contexts. Two audiovisual experiments are presented and provide experimental data for the present analysis. First, two experimental contexts are ...
متن کاملOnline Multi-Modal Robust Non-Negative Dictionary Learning for Visual Tracking
Dictionary learning is a method of acquiring a collection of atoms for subsequent signal representation. Due to its excellent representation ability, dictionary learning has been widely applied in multimedia and computer vision. However, conventional dictionary learning algorithms fail to deal with multi-modal datasets. In this paper, we propose an online multi-modal robust non-negative diction...
متن کاملAssessment of learning style based on VARK model among the students of Qom University of Medical Sciences
Introduction: Learning is a dominant phenomenon in human life. Learners are different from each other in terms of attitudes and cognitive styles which effect on the learning of people. In this connection, VARK learning style assess the students base their individual abilities and method for obtaining much information from environment in dimensions of visual, aural, read/write, and kinesthetic. ...
متن کامل