captions

Variational Autoencoder for Deep Learning of Images, Labels and Captions

2016

Yunchen Pu Zhe Gan Ricardo Henao Xin Yuan Chunyuan Li Andrew Stevens Lawrence Carin

A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to g...

متن کامل

Content-Driven Detection of Cyberbullying on the Instagram Social Network

2016

Haoti Zhong Hao Li Anna Cinzia Squicciarini Sarah Michele Rajtmajer Christopher Griffin David J. Miller Cornelia Caragea

We study detection of cyberbullying in photosharing networks, with an eye on developing earlywarning mechanisms for the prediction of posted images vulnerable to attacks. Given the overwhelming increase in media accompanying text in online social networks, we investigate use of posted images and captions for improved detection of bullying in response to shared content. We validate our approache...

متن کامل

Enhanced Sports Image Annotation and Retrieval Based Upon Semantic Analysis of Multimodal Cues

2009

Kraisak Kesorn Stefan Poslad

This paper presents a framework for semi-automatic annotation and semantic image retrieval, applied to the sports domain, based upon semantic analysis of both image text captions and visual features of the image. Unstructured text captions of images are analysed in order to extract the concepts and restructure them into a semantic model. SVM classification of the multi-dominant colours and edge...

متن کامل

Overview of Natural Language Processing of Captions for Retrieving Multimedia Data

1992

Eugene J. Guglielmo Neil C. Rowe

This paper briefly describes the current implementation status of an intelligent information retrieval system, MARIE, that employs natural language processing techniques. Descriptive captions are used to identify photographic images concerning various military projects. The captions are parsed to produce a logical form from which nouns and verbs are extracted to form the primary keywords. User ...

متن کامل

AutoCAP: An Automatic Caption Generation System based on the Text Knowledge Power Series Representation Model

2017

M. Takaya S. Aoki T. Miyamoto B. Yao X. Yang L. Lin M. W. Lee

This paper describes Automatic Caption generation for news Articles, it is an experimental intelligent system that generates presentations in text based on the text knowledge power series representation model. Captions or titles are useful for users who only need information on the main topics of an article. Using current extractive summarization techniques, it is not able to generate a coheren...

متن کامل

Topic-Specific Image Caption Generation

2017

Chang Zhou Yuzhao Mao Xiaojie Wang

Recently, image caption which aims to generate a textual description for an image automatically has attracted researchers from various fields. Encouraging performance has been achieved by applying deep neural networks. Most of these works aim at generating a single caption which may be incomprehensive, especially for complex images. This paper proposes a topic-specific multi-caption generator, ...

متن کامل

SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set

Journal: :CoRR 2017

William Havard Laurent Besacier Olivier Rosec

This paper presents an augmentation of MSCOCO dataset where speech is added to image and text. Speech captions are generated using text-to-speech (TTS) synthesis resulting in 616,767 spoken captions (more than 600h) paired with images. Disfluencies and speed perturbation are added to the signal in order to sound more natural. Each speech signal (WAV) is paired with a JSON file containing exact ...

متن کامل

Dense video captioning based on local attention

Journal: :Iet Image Processing 2023

Dense video captioning aims to locate multiple events in an untrimmed and generate captions for each event. Previous methods experienced difficulties establishing the multimodal feature relationship between frames captions, resulting low accuracy of generated captions. To address this problem, a novel Video Captioning Model Based on Local Attention (DVCL) is proposed. DVCL employs 2D temporal d...

متن کامل

Learning Visual Representations using Images with Captions

2006

Current methods for learning visual categories work well when a large amount of labeled data is available, but can run into severe difficulties when the number of labeled examples is small. When labeled data is scarce it may be beneficial to use unlabeled data to learn an image representation that is low-dimensional, but nevertheless captures the information required to discriminate between ima...

متن کامل

Asr-based Subtitling of Live Tv-programs

2005

Trym Holter Erik Harborg Magne Hallstein Johnsen Torbjørn Svendsen

A system for on-line generation of closed captions (subtitles) for broadcast of live TV-programs is described. During broadcast, a commentator formulates a possibly condensed, but semantically correct version of the original speech. These compressed phrases are recognized by a continuous speech recognizer, and the resulting captions are fed into the teletext system. This application will provid...

متن کامل