audio and video products

Incorporating Audio Cues into Dialog and Action Scene Extraction

2003

Lei Chen Shariq J. Rizvi M. Tamer Özsu

In this paper, we present an approach to extract scenes in video. The approach is top-down and uses video editing rules and audio cues to extract simple dialog and action scenes. The underlying model is a finite state machine coupled with audio cues that are determined using an audio classifier.

متن کامل

Telefonica Research Content-Based Copy Detection TRECVID Submission

2009

Xavier Anguera Pere Obrador Tomasz Adamek David Marimon

This notebook paper presents the systems presented by Telefonica Research within the MESH team for the task of Video copy detection in TRECVID 2009. We participated in the Video-only, Audio-only and Audio+Video tasks. Our main contribution is the combination (when possible) of audio and video features within the same system by using global features extracted both from the reference videos and t...

متن کامل

The audio-video australian English speech data corpus AVOZES

2004

J. Bruce Millar Roland Göcke

This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains recordings of 20 speakers uttering a variety of phrases. The corpus was designed for research on the statistical relationship of audio and video speech parameters with an audio-video (AV) automatic speech recognition (ASR) task in mind, but may be useful for other research tasks. AVOZES is the first pu...

متن کامل

Audio Feature Extraction & Analysis for Scene Classification

1997

Zhu Liu Jincheng Huang Yao Wang Tsuhan Chen

Analysis and classification of the scene content of a video sequence are very important for content-based indexing and retrieval of multimedia databases. In this paper, we report our research on using the associated audio information for video scene classification. We describe several audio features that have been found effective in distinguishing audio characteristics of different scene classe...

متن کامل

Driver Frustration Detection from Audio and Video in the Wild

2016

Irman Abdic Alex Fridman Daniel McDuff Erik Marchi Bryan Reimer Björn W. Schuller

We present a method for detecting driver frustration from both video and audio streams captured during the driver’s interaction with an in-vehicle voice-based navigation system. The video is of the driver’s face when the machine is speaking, and the audio is of the driver’s voice when he or she is speaking. We analyze a dataset of 20 drivers that contains 596 audio epochs (audio clips, with dur...

متن کامل

Characteristics of Streaming Audio and Video Stored on the Internet

2003

Mingzhe Li Mark Claypool Robert Kinicki Jim Nichols James Nichols

The increasing power and connectivity of today’s computers have spurred the growth in streaming audio and video available on the Internet through the Web. While there is substantial research characterizing the performance of streaming media and characterizing documents stored on the Internet, there have been few studies characterizing streaming audio and video stored on the Web. We crawled over...

متن کامل

Impairment-Factor-Based Audiovisual Quality Model for IPTV: Influence of Video Resolution, Degradation Type, and Content Type

Journal: :EURASIP J. Image and Video Processing 2011

Marie-Neige Garcia Robert Schleicher Alexander Raake

This paper presents an audiovisual quality model for IPTV services. The model estimates the audiovisual quality of standard and high definition video as perceived by the user. The model is developed for applications such as network planning and packet-layer quality monitoring. It mainly covers audio and video compression artifacts and impairments due to packet loss. The quality tests conducted ...

متن کامل

Use of Video and Audio Texts in EFL Listening Test

2015

Ahmet Başal İbrahim Demir

The study aims to discover whether audio or video modality in a listening test is more beneficial to test takers. In this study, the posttest-only control group design was utilized and quantitative data were collected in order to measure participant performances concerning two types of modality (audio or video) in a listening test. The participants, first grade students from an ELT program, wer...

متن کامل

A Synchronization Ground Truth for the Jiku Mobile Video Dataset

2015

Mario Guggenberger Mathias Lux László Böszörményi

This paper introduces and describes a manually generated synchronization ground truth, accurate to the level of the audio sample, for the Jiku Mobile Video Dataset, a dataset containing hundreds of videos recorded by mobile users at different events with drama, dancing and singing performances. It aims at encouraging researchers to evaluate the performance of their audio, video, or multimodal s...

متن کامل

Video-Audio Domain Generalization via Confounder Disentanglement

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2023

Existing video-audio understanding models are trained and evaluated in an intra-domain setting, facing performance degeneration real-world applications where multiple domains distribution shifts naturally exist. The key to domain generalization (VADG) lies alleviating spurious correlations over multi-modal features. To achieve this goal, we resort causal theory attribute such correlation confou...

متن کامل