نتایج جستجو برای: captioning order

تعداد نتایج: 908879  

2018
Bairui Wang Lin Ma Wei Zhang Wei Liu

In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) with a novel encoder-decoder-reconstructor architecture, which leverages both the forward (video to sentence) and backward (...

Journal: :Proceedings of the AAAI Conference on Artificial Intelligence 2019

Journal: :Proceedings of the AAAI Conference on Artificial Intelligence 2019

Journal: :Procedia Computer Science 2019

Journal: :IEEE Signal Processing Letters 2023

State-of-the-art audio captioning methods typically use the encoder-decoder structure with pretrained neural networks (PANNs) as encoders for feature extraction. However, convolution operation used in PANNs is limited capturing long-time dependencies within an signal, thereby leading to potential performance degradation captioning. This letter presents a novel method using graph attention (Grap...

Journal: :ACM Transactions on Multimedia Computing, Communications, and Applications 2021

Video captioning is a challenging task in the field of multimedia processing, which aims to generate informative natural language descriptions/captions describe video contents. Previous approaches mainly focused on capturing visual information videos using an encoder-decoder structure captions. Recently, new encoder-decoder-reconstructor was proposed for captioning, captured both and Based this...

2017
Yujun Lin Song Han William J. Dally

Long Short-Term Memory (LSTM) is widely used to solve sequence modeling problems, for example, image captioning. We found the LSTM cells are heavily redundant. We adopt network pruning to reduce the redundancy of LSTM and introduce sparsity as new regularization to reduce overfitting. We can achieve better performance than the dense baseline while reducing the total number of parameters in LSTM...

2016
Zhuhao Wang Fei Wu Weiming Lu Jun Xiao Xi Li Zitong Zhang Yueting Zhuang

Generally speaking, different persons tend to describe images from various aspects due to their individually subjective perception. As a result, generating the appropriate descriptions of images with both diversity and high quality is of great importance. In this paper, we propose a framework called GroupTalk to learn multiple image caption distributions simultaneously and effectively mimic the...

Journal: :Journal of Language and Cultural Education 2019

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید