captions

Towards Video Captioning with Naming: A Novel Dataset and a Multi-modal Approach

2017

Stefano Pini Marcella Cornia Lorenzo Baraldi Rita Cucchiara

Current approaches for movie description lack the ability to name characters with their proper names, and can only indicate people with a generic “someone” tag. In this paper we present two contributions towards the development of video description architectures with naming capabilities: firstly, we collect and release an extension of the popular Montreal Video Annotation Dataset in which the v...

متن کامل

XGPT: Cross-modal Generative Pre-Training for Image Captioning

Journal: :Lecture Notes in Computer Science 2021

In this paper, we propose XGPT, a new method of Cross-modal Generative Pre-Training for Image Captioning that is designed to pre-train text-to-image caption generators through four novel generation tasks, including Adversarial (AIC), Image-conditioned Masked Language Modeling (IMLM), Denoising Autoencoding (IDA), and Text-conditioned Feature Generation (TIFG). As result, the pre-trained XGPT ca...

متن کامل

Beam-plasma Instability in Strongly Correlated Plasmas

2014

Z. C. Tao Z. C. TAO

. . . . . . . . . . . . . . . . . . . . . . . . . . . . i Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Figure Captions . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

متن کامل

SenseCam, imagery and bias in memory for wellbeing

2011

Fionnuala C. Murphy Philip J. Barnard Kayleigh A. M. Terry Maria Teresa Carthery-Goulart Emily A. Holmes

Identifying and modifying the negative interpretation bias that characterises depression is central to successful treatment. While accumulating evidence indicates that mental imagery is particularly effective in the modification of emotional bias, this research typically incorporates static and unrelated ambiguous stimuli. SenseCam technology, and the resulting video-like footage, offers an opp...

متن کامل

Understanding of Navy Technical Language via Statistical Parsing

2004

Neil C. Rowe

A key problem in indexing technical information is the interpretation of technical words and word senses, expressions not used in everyday language. This is important for captions on technical images, whose often pithy descriptions can be valuable to decipher. We describe the natural-language processing for MARIE-2, a natural-language information retrieval system for multimedia captions. Our ap...

متن کامل

Partial and Synchronized Caption Generation to Develop Second Language Listening Skill

2014

Maryam Sadat MIRZAEI Yuya AKITA Tatsuya KAWAHARA

Captioning is widely used by second language learners as an assistive tool for listening. However, the use of captions often leads to word-by-word decoding and over-reliance on reading skill rather than improving listening skill. With the purpose of encouraging the learners to listen to the audio instead of merely reading the text, the study introduces a novel technique of captioning, partial a...

متن کامل

Automatic Closed Caption Detection and Filtering in MPEG Videos for Video Structuring

Journal: :J. Inf. Sci. Eng. 2006

Duan-Yu Chen Ming-Ho Hsiao Suh-Yin Lee

Video structuring is the process of extracting temporal structural information of video sequences and is a crucial step in video content analysis especially for sports videos. It involves detecting temporal boundaries, identifying meaningful segments of a video and then building a compact representation of video content. Therefore, in this paper, we propose a novel mechanism to automatically pa...

متن کامل

Designing Caption Production Rules Based on Face, Text and Motion Detections

2008

C. Chapdelaine M. Beaulieu L. Gagnon

Producing off-line captions for the deaf and hearing impaired people is a labor-intensive task that can require up to 18 hours of production per hour of film. Captions are placed manually close to the region of interest but it must avoid masking human faces, texts or any moving objects that might be relevant to the story flow. Our goal is to use image processing techniques to reduce the off-lin...

متن کامل

Improving Accessibility of Transaction-centric Web Objects

2010

Muhammad Asiful Islam Faisal Ahmed Yevgen Borodin Jalal Mahmud I. V. Ramakrishnan

Advances in web technology have considerably widened the Web accessibility divide between sighted and blind users. This divide is especially acute when conducting online transactions, e.g., shopping, paying bills, making travel plans, etc. Such transactions span multiple web pages and require that users find clickable objects (e.g., “add-to-cart” button) which are essential for transaction prog...

متن کامل

Evaluation of Automatic Video Captioning Using Direct Assessment

Journal: :CoRR 2017

Yvette Graham George Awad Alan F. Smeaton

We present Direct Assessment, a method for manually assessing the quality of automatically-generated captions for video. Evaluating the accuracy of video captions is particularly difficult because for any given video clip there is no definitive ground truth or correct answer against which to measure. Automatic metrics for comparing automatic video captions against a manual caption such as BLEU ...

متن کامل