audio and video products

Detecting Semantic Concepts from Video Using Temporal Gradients and Audio Classification

2003

Mika Rautiainen Tapio Seppänen Jani Penttilä Johannes Peltola

In this paper we describe new methods to detect semantic concepts from digital video based on audible and visual content. Temporal Gradient Correlogram captures temporal correlations of gradient edge directions from sampled shot frames. Power-related physical features are extracted from short audio samples in video shots. Video shots containing people, cityscape, landscape, speech or instrument...

متن کامل

A Scalable Video Server : Architecture , Design , and Implementation November 1994

1994

Christoph Bernhardt Ernst Biersack

Multimedia applications such as Video On-Demand, Tele-Shopping, or Distance Learning require a storage facility for audio and video data, called video server. All these applications are very demanding in terms of storage capacity, storage bandwidth, and transmission bandwidth. A video server must also meet the requirements that stem from the continuous nature of audio and video. It must guarant...

متن کامل

A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging

Journal: :CoRR 2017

Juncheng Li Yun Wang Joseph Szurley Florian Metze Samarjit Das

The lack of strong labels has severely limited the state-of-the-art fully supervised audio tagging systems to be scaled to larger dataset. Meanwhile, audio-visual learning models based on unlabeled videos have been successfully applied to audio tagging, but they are inevitably resource hungry and require a long time to train. In this work, we propose a light-weight, multimodal framework for env...

متن کامل

Joint audio-video object localization using a recursive multi-state multi-sensor estimator

2000

Norbert Strobel Sascha Spors Rudolf Rabenstein

Object localization based on audio and video information is important for the analysis of dynamic scenes such as video conferences or traffic situations. In this paper, we view the the dynamic audiovideo object localization problem as a joint recursive estimation problem. It is solved using a decentralized Kalman filter fusing both audio and video position estimates. To better take into account...

متن کامل

Vertical Sound Source Localization Influenced by Visual Stimuli

2013

Stephan Werner Judith Liebetrau Thomas Sporer

It is well known that the perception of the position of audio and video stimuli is not independent. In general, video dominates the position if the position offset between audio and video is small. Most previous work focused on natural listening conditions and position offsets between audio and video in the horizontal plane. There is little research concerning offsets in vertical direction and ...

متن کامل

Improved Speech Recognition using Adaptive Audio-visual Fusion via a Stochastic Secondary Classifier

2004

Simon Lucey Sridha Sridharan Vinod Chandran

The adaptive fusion of video and audio is one of the fundamental pursuits of audio visual speech recognition (AVSR). In this paper the use of a high dimensional secondary classijier on the word likelihood scores from both the audio and video modalities is investigated fo r the purposes of adaptive fusion. Results are presented that lie above or equal to the boundary of catastrophic fusion acros...

متن کامل

Enhancing Data Security Using Audio-Video Steganography

Journal: :International Journal of Engineering & Technology 2018

متن کامل

A hybrid ANN/HMM audio-visual speech recognition system

2001

Martin Heckmann Frédéric Berthommier Kristian Kroschel

In this paper we present a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. To setup the system it was necessary to record a new audio-visual database. We will describe the recording and labeling of the database. The fusion of audio and video data is a key aspect of the paper. Three conditions, when only the audio or ...

متن کامل

A Hybrid Ann/hmm Audio-visual Spee System

2001

Martin Heckmann

In this paper we present a system for audio-visual speech recognition based on a hybrid Artificial Neural Network/Hidden Markov Model (ANN/HMM) approach. To setup the system it was necessary to record a new audio-visual database. We will describe the recording and labeling of the database. The fusion of audio and video data is a key aspect of the paper. Three conditions, when only the audio or ...

متن کامل

Spatiotemporal person authentication based on multilevel fusion

2007

Girija Chetty Michael Wagner

In this paper, we propose a spatiotemporal person authentication approach based on multilevel fusion of 3D face biometric information with audio and visual speech information. The proposed approach combines the information from three audio-video based modules, namely: audio, visual speech, and 3D face and performs tri-module fusion in an automatic, unsupervised and adaptive manner, by adapting ...

متن کامل