Detecting Audio/Video Asynchrony
نویسندگان
چکیده
Asynchrony between the audio and video track in streaming video can cause user frustration with the service but often goes unreported. Providers of on-demand internet streaming media such as Netflix reach millions of users and thus are concerned with supplying correctly aligned audio/video content. Our project will attempt to detect A/V misalignment in standard (nonstreaming) movie clips. We expect that if certain time-dependent video and audio features are correlated, we should be able to predict the misalignment of a video and audio track by comparing the progression of these audio and video features over a segment of the movie clip.
منابع مشابه
Audio-visual anticipatory coarticulation modeling by human and machine
The phenomenon of anticipatory coarticulation provides a basis for the observed asynchrony between the acoustic and visual onsets of phones in certain linguistic contexts. This type of asynchrony is typically not explicitly modeled in audio-visual speech models. In this work, we study within-word audiovisual asynchrony using manual labels of words in which theory suggests that audio-visual asyn...
متن کاملTemporal integration for live conversational speech
The difficulty in detecting short asynchronies between corresponding audio and video signals demonstrates the remarkable resilience of the perceptual system when integrating the senses. Thresholds for perceived synchrony vary depending on the complexity, congruency and predictability of the audiovisual event. For instance, asynchrony is typically detected sooner for simple flash and tone combin...
متن کاملContext-specific effects of musical expertise on audiovisual integration
Ensemble musicians exchange auditory and visual signals that can facilitate interpersonal synchronization. Musical expertise improves how precisely auditory and visual signals are perceptually integrated and increases sensitivity to asynchrony between them. Whether expertise improves sensitivity to audiovisual asynchrony in all instrumental contexts or only in those using sound-producing gestur...
متن کاملDetecting audio-visual synchrony using deep neural networks
In this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example spoof detection in biometrics, lip-syncing, speaker detection and diarization in multi-subject videos, and video data quality assurance. In our adopted approach, we investig...
متن کاملSpeech intelligibility derived from asynchronous processing of auditory-visual information
The current study examines the temporal parameters associated with cross-modal integration of auditory-visual information for sentential material (Harvard/IEEE sentences). The speech signal was filtered into 1/3-octave channels, all of which were discarded (in the primary experiment) save for a low-frequency (298-375 Hz) and a high-frequency (4762-6000 Hz) band. The intelligibility of this audi...
متن کامل