Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning
نویسندگان
چکیده
Observing a set of images and their corresponding paragraph-captions, challenging task is to learn how produce semantically coherent paragraph describe the visual content an image. Inspired by recent successes in integrating semantic topics into this task, paper develops plug-and-play hierarchical-topic-guided image generation framework, which couples extractor with deep topic model guide learning language model. To capture correlations between text at multiple levels abstraction from images, we design variational inference network build mapping features textual captions. generation, learned hierarchical are integrated model, including Long Short-Term Memory (LSTM) Transformer, jointly optimized. Experiments on public datasets demonstrate that proposed models, competitive many state-of-the-art approaches terms standard evaluation metrics, can be used both distill interpretable multi-layer generate diverse We release our code https://github.com/DandanGuo1993/VTCM-based-image-paragraph-caption.git
منابع مشابه
Retinal Image Matching Using Hierarchical Vascular Features
We propose a method for retinal image matching that can be used in image matching for person identification or patient longitudinal study. Vascular invariant features are extracted from the retinal image, and a feature vector is constructed for each of the vessel segments in the retinal blood vessels. The feature vectors are represented in a tree structure with maintaining the vessel segments a...
متن کاملImage Captioning using Visual Attention
This project aims at generating captions for images using neural language models. There has been a substantial increase in number of proposed models for image captioning task since neural language models and convolutional neural networks(CNN) became popular. Our project has its base on one of such works, which uses a variant of Recurrent neural network coupled with a CNN. We intend to enhance t...
متن کاملHierarchical Matching of Cortical Features for Deformable Brain Image Registration
This paper builds upon our previous work on elastic registration, using surface-to-surface mapping. In particular, a methodology for finding a smooth map from one cortical surface to another is presented, using constraints imposed by a number of sulcal and gyral curves. The outer cortical surface is represented by a map from the unit sphere to the surface which is obtained by a deformable surfa...
متن کاملVehicle Logo Recognition Using Image Matching and Textural Features
In recent years, automatic recognition of vehicle logos has become one of the important issues in modern cities. This is due to the unlimited increase of cars and transportation systems that make it impossible to be fully managed and monitored by human. In this research, an automatic real-time logo recognition system for moving cars is introduced based on histogram manipulation. In the proposed...
متن کاملStack-Captioning: Coarse-to-Fine Learning for Image Captioning
The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Computer Vision
سال: 2022
ISSN: ['0920-5691', '1573-1405']
DOI: https://doi.org/10.1007/s11263-022-01624-6