captioning order

Image captioning aims to generate a corresponding description of an image. In recent years, neural encoder-decoder models have been the dominant approaches, in which Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) are used translate image into natural language description. Among these visual attention mechanisms widely enable deeper understanding through fine-grained analys...

متن کامل

Dual-level Collaborative Transformer for Image Captioning

Journal: :Proceedings of the ... AAAI Conference on Artificial Intelligence 2021

Descriptive region features extracted by object detection networks have played an important role in the recent advancements of image captioning. However, they are still criticized for lack contextual information and fine-grained details, which contrast merits traditional grid features. In this paper, we introduce a novel Dual-Level Collaborative Transformer (DLCT) network to realize complementa...

متن کامل

Semantic-Guided Selective Representation for Image Captioning

Journal: :IEEE Access 2023

Grid-based features have been proven to be as effective region-based in multi-modal tasks such visual question answering. However, its application image captioning encounters two main issues, namely, noisy and fragmented semantics. In this paper, we propose a novel feature selection scheme, with Relation-Aware Selection (RAS) Fine-grained Semantic Guidance (FSG) learning strategy. Based on the ...

متن کامل

Towards local visual modeling for image captioning

Journal: :Pattern Recognition 2023

In this paper, we study the local visual modeling with grid features for image captioning, which is critical generating accurate and detailed captions. To achieve target, propose a Locality-Sensitive Transformer Network (LSTNet) two novel designs, namely Attention (LSA) Fusion (LSF). LSA deployed intra-layer interaction in via relationship between each its neighbors. It reduces difficulty of ob...

متن کامل

Video captioning with stacked attention and semantic hard pull

Journal: :PeerJ 2021

Video captioning, i.e. , the task of generating captions from video sequences creates a bridge between Natural Language Processing and Computer Vision domains computer science. The semantically accurate description is quite complex. Considering complexity, problem, results obtained in recent research works are praiseworthy. However, there plenty scope for further investigation. This paper addre...

متن کامل

نمودار تعداد نتایج جستجو در هر سال

با کلیک روی نمودار نتایج را به سال انتشار فیلتر کنید