captioning order

ConvNet Architecture Search for Spatiotemporal Feature Learning

Journal: :CoRR 2017

Du Tran Jamie Ray Zheng Shou Shih-Fu Chang Manohar Paluri

Learning image representations with ConvNets by pretraining on ImageNet has proven useful across many visual understanding tasks including object detection, semantic segmentation, and image captioning. Although any image representation can be applied to video frames, a dedicated spatiotemporal representation is still vital in order to incorporate motion patterns that cannot be captured by appea...

متن کامل

Sequence-to-Sequence Video Captioning with Residual Connected Gated Recurrent Units

Journal: :Europan journal of science and technology 2022

Recurrent neural networks have recently emerged as a useful tool in computer vision and language modeling tasks such image video captioning. The main limitation of these is preserving the gradient flow network gets deeper. We propose captioning approach that utilizes residual connections to overcome this maintain by carrying information through layers from bottom top with additive features. exp...

متن کامل

Automated audio captioning: an overview of recent progress and new challenges

Journal: :Eurasip Journal on Audio, Speech, and Music Processing 2022

Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given clips. This has received increasing attention with the release of freely available datasets in recent years. The problem been addressed predominantly deep learning techniques. Numerous approaches have proposed, such as investigating different neural network architectures, e...

متن کامل

Image Captioning and Classification of Dangerous Situations

Journal: :CoRR 2017

Octavio Arriaga Paul Plöger Matias Valdenegro-Toro

Current robot platforms are being employed to collaborate with humans in a wide range of domestic and industrial tasks. These environments require autonomous systems that are able to classify and communicate anomalous situations such as fires, injured persons, car accidents; or generally, any potentially dangerous situation for humans. In this paper we introduce an anomaly detection dataset for...

متن کامل

Learning to Guide Decoding for Image Captioning

2018

Wenhao Jiang Lin Ma Xinpeng Chen Hanwang Zhang Wei Liu

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at ...

متن کامل

Automated closed captioning for Russian live broadcasting

2014

Kirill Levin Irina Ponomareva Anna Bulusheva German Chernykh Ivan Medennikov Nickolay Merkin Alexey Prudnikov Natalia A. Tomashenko

The paper describes a hardware-software system for real-time closed captioning of Russian live TV broadcasts. The use of respeaking technology enabled us to create an ASR system with WER not exceeding 5.5%. Editing closed captions in real time further reduces WER down to 0.2%. In the paper we report some advancements in LMs for a highly inflected language and also in using morphological rescori...

متن کامل

Automated closed-captioning using text alignment

2004

Anthony F. Martone Cüneyt M. Taskiran Edward J. Delp

The production of closed captions is an important but expensive process in video broadcasting. We propose a method to generate highly accurate off-line captions efficiently. Our system uses text alignment to synchronize program transcripts obtained for a video program with text produced by an automatic speech recognition (ASR) system. We will also describe the accuracy in both closed-caption te...

متن کامل

Watermark Captioning for Images in E-Governance

2012

Jobin Abraham

Watermarking is well known as a tool for copyright protection of documents. Digital watermarking is also useful for content authentication and tamper detection. Watermarking could be migrated to egovernance for enhancing security of various e-governance applications. Successful e-governance implementation requires all digital documents issued by the government is protected from illegal attacks ...

متن کامل

Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning

Journal: :CoRR 2016

Andrew Shin Masataka Yamaguchi Katsunori Ohnishi Tatsuya Harada

The workflow of extracting features from images using convolutional neural networks (CNN) and generating captions with recurrent neural networks (RNN) has become a de-facto standard for image captioning task. However, since CNN features are originally designed for classification task, it is mostly concerned with the main conspicuous element of the image, and often fails to correctly convey info...

متن کامل

ADVISE: Symbolism and External Knowledge for Decoding Advertisements

Journal: :CoRR 2017

Keren Ye Adriana Kovashka

In order to convey the most content in their limited space, advertisements embed references to outside knowledge via symbolism. For example, a motorcycle stands for adventure (a positive property the ad wants associated with the product being sold), and a gun stands for danger (a negative property to dissuade viewers from undesirable behaviors). We show how to use symbolic references to better ...

متن کامل