نتایج جستجو برای: audio and video products

تعداد نتایج: 16882315  

2003
Mika Rautiainen Tapio Seppänen Jani Penttilä Johannes Peltola

In this paper we describe new methods to detect semantic concepts from digital video based on audible and visual content. Temporal Gradient Correlogram captures temporal correlations of gradient edge directions from sampled shot frames. Power-related physical features are extracted from short audio samples in video shots. Video shots containing people, cityscape, landscape, speech or instrument...

1994
Christoph Bernhardt Ernst Biersack

Multimedia applications such as Video On-Demand, Tele-Shopping, or Distance Learning require a storage facility for audio and video data, called video server. All these applications are very demanding in terms of storage capacity, storage bandwidth, and transmission bandwidth. A video server must also meet the requirements that stem from the continuous nature of audio and video. It must guarant...

Journal: :CoRR 2017
Juncheng Li Yun Wang Joseph Szurley Florian Metze Samarjit Das

The lack of strong labels has severely limited the state-of-the-art fully supervised audio tagging systems to be scaled to larger dataset. Meanwhile, audio-visual learning models based on unlabeled videos have been successfully applied to audio tagging, but they are inevitably resource hungry and require a long time to train. In this work, we propose a light-weight, multimodal framework for env...

2000
Norbert Strobel Sascha Spors Rudolf Rabenstein

Object localization based on audio and video information is important for the analysis of dynamic scenes such as video conferences or traffic situations. In this paper, we view the the dynamic audiovideo object localization problem as a joint recursive estimation problem. It is solved using a decentralized Kalman filter fusing both audio and video position estimates. To better take into account...

2013
Stephan Werner Judith Liebetrau Thomas Sporer

It is well known that the perception of the position of audio and video stimuli is not independent. In general, video dominates the position if the position offset between audio and video is small. Most previous work focused on natural listening conditions and position offsets between audio and video in the horizontal plane. There is little research concerning offsets in vertical direction and ...

2004
Simon Lucey Sridha Sridharan Vinod Chandran

The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classijier on the word likelihood scores from both the audio and video modalities is investigated fo r the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion acros...

Journal: :International Journal of Engineering & Technology 2018

2001
Martin Heckmann Frédéric Berthommier Kristian Kroschel

In this paper we present a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. To setup the system it was necessary to record a new audio-visual database. We will describe the recording and labeling of the database. The fusion of audio and video data is a key aspect of the paper. Three conditions, when only the audio or ...

2001
Martin Heckmann

In this paper we present a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. To setup the system it was necessary to record a new audio-visual database. We will describe the recording and labeling of the database. The fusion of audio and video data is a key aspect of the paper. Three conditions, when only the audio or ...

2007
Girija Chetty Michael Wagner

In this paper, we propose a spatiotemporal person authentication approach based on multilevel fusion of 3D face biometric information with audio and visual speech information. The proposed approach combines the information from three audio-video based modules, namely: audio, visual speech, and 3D face and performs tri-module fusion in an automatic, unsupervised and adaptive manner, by adapting ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید