نتایج جستجو برای: captioning order

تعداد نتایج: 908879  

Journal: :CoRR 2017
Du Tran Jamie Ray Zheng Shou Shih-Fu Chang Manohar Paluri

Learning image representations with ConvNets by pretraining on ImageNet has proven useful across many visual understanding tasks including object detection, semantic segmentation, and image captioning. Although any image representation can be applied to video frames, a dedicated spatiotemporal representation is still vital in order to incorporate motion patterns that cannot be captured by appea...

Journal: :Europan journal of science and technology 2022

Recurrent neural networks have recently emerged as a useful tool in computer vision and language modeling tasks such image video captioning. The main limitation of these is preserving the gradient flow network gets deeper. We propose captioning approach that utilizes residual connections to overcome this maintain by carrying information through layers from bottom top with additive features. exp...

Journal: :Eurasip Journal on Audio, Speech, and Music Processing 2022

Automated audio captioning is a cross-modal translation task that aims to generate natural language descriptions for given clips. This has received increasing attention with the release of freely available datasets in recent years. The problem been addressed predominantly deep learning techniques. Numerous approaches have proposed, such as investigating different neural network architectures, e...

Journal: :CoRR 2017
Octavio Arriaga Paul Plöger Matias Valdenegro-Toro

Current robot platforms are being employed to collaborate with humans in a wide range of domestic and industrial tasks. These environments require autonomous systems that are able to classify and communicate anomalous situations such as fires, injured persons, car accidents; or generally, any potentially dangerous situation for humans. In this paper we introduce an anomaly detection dataset for...

2018
Wenhao Jiang Lin Ma Xinpeng Chen Hanwang Zhang Wei Liu

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at ...

2014
Kirill Levin Irina Ponomareva Anna Bulusheva German Chernykh Ivan Medennikov Nickolay Merkin Alexey Prudnikov Natalia A. Tomashenko

The paper describes a hardware-software system for real-time closed captioning of Russian live TV broadcasts. The use of respeaking technology enabled us to create an ASR system with WER not exceeding 5.5%. Editing closed captions in real time further reduces WER down to 0.2%. In the paper we report some advancements in LMs for a highly inflected language and also in using morphological rescori...

2004
Anthony F. Martone Cüneyt M. Taskiran Edward J. Delp

The production of closed captions is an important but expensive process in video broadcasting. We propose a method to generate highly accurate off-line captions efficiently. Our system uses text alignment to synchronize program transcripts obtained for a video program with text produced by an automatic speech recognition (ASR) system. We will also describe the accuracy in both closed-caption te...

2012
Jobin Abraham

Watermarking is well known as a tool for copyright protection of documents. Digital watermarking is also useful for content authentication and tamper detection. Watermarking could be migrated to egovernance for enhancing security of various e-governance applications. Successful e-governance implementation requires all digital documents issued by the government is protected from illegal attacks ...

Journal: :CoRR 2016
Andrew Shin Masataka Yamaguchi Katsunori Ohnishi Tatsuya Harada

The workflow of extracting features from images using convolutional neural networks (CNN) and generating captions with recurrent neural networks (RNN) has become a de-facto standard for image captioning task. However, since CNN features are originally designed for classification task, it is mostly concerned with the main conspicuous element of the image, and often fails to correctly convey info...

Journal: :CoRR 2017
Keren Ye Adriana Kovashka

In order to convey the most content in their limited space, advertisements embed references to outside knowledge via symbolism. For example, a motorcycle stands for adventure (a positive property the ad wants associated with the product being sold), and a gun stands for danger (a negative property to dissuade viewers from undesirable behaviors). We show how to use symbolic references to better ...

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید