From plane to hierarchy: Deformable Transformer for Remote Sensing Image Captioning
نویسندگان
چکیده
With the growth of remote sensing images, un-derstanding image content automatically has attracted many researchers' interests in deep learning for image. Inspired from natural captioning, model with CNN-RNN as backbone and supplemented by attention been widely used captioning. However, it is inefficient current layer to simultaneously mine hidden foreground background perform feature interactive learning. Meanwhile, new mainstream language recently surpassed traditional LSTM sentence generation. For solving above problems, this paper, we proposed a novel thought make flat images stereoscopic separating fore- background. Based on hierarchical informa-tion, designed Deformable Transformer equipped deformable scaled dot-product learn multi-scale through powerful ability. Evaluations are conducted Four classic captioning datasets. Compared state-of-the-art methods, our variant achieves higher accuracy.
منابع مشابه
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...
متن کاملRemote Sensing: From Image Processing to Spatio-temporal Processing
This paper gives a brief survey of remote sensing techniques from a viewpoint of pattern recongition and media understanding (PRMU). First we give a brief summary of remote sensing, and then introduce related work on both remote sensing image processing and some unique issues in remote sensing image processing. We moreover point out that the future direction of remote sensing is expected to be ...
متن کاملRemote Sensing Image Processing
About SYNTHESIs This volume is a printed version of a work that appears in the Synthesis Digital Library of Engineering and Computer Science. Synthesis Lectures provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. For more information visit www.morganclaypool.com SYNTHESIS LECTURES ON IMAGE, VIDEO & MULTIMEDIA PRO...
متن کاملLearning to Guide Decoding for Image Captioning
Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component called guiding network. The guiding network models the attribute properties of input images, and its output is leveraged to compose the input of the decoder at ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
سال: 2023
ISSN: ['2151-1535', '1939-1404']
DOI: https://doi.org/10.1109/jstars.2023.3305889