Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning

نویسندگان

چکیده

Observing a set of images and their corresponding paragraph-captions, challenging task is to learn how produce semantically coherent paragraph describe the visual content an image. Inspired by recent successes in integrating semantic topics into this task, paper develops plug-and-play hierarchical-topic-guided image generation framework, which couples extractor with deep topic model guide learning language model. To capture correlations between text at multiple levels abstraction from images, we design variational inference network build mapping features textual captions. generation, learned hierarchical are integrated model, including Long Short-Term Memory (LSTM) Transformer, jointly optimized. Experiments on public datasets demonstrate that proposed models, competitive many state-of-the-art approaches terms standard evaluation metrics, can be used both distill interpretable multi-layer generate diverse We release our code https://github.com/DandanGuo1993/VTCM-based-image-paragraph-caption.git

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Retinal Image Matching Using Hierarchical Vascular Features

We propose a method for retinal image matching that can be used in image matching for person identification or patient longitudinal study. Vascular invariant features are extracted from the retinal image, and a feature vector is constructed for each of the vessel segments in the retinal blood vessels. The feature vectors are represented in a tree structure with maintaining the vessel segments a...

متن کامل

Image Captioning using Visual Attention

This project aims at generating captions for images using neural language models. There has been a substantial increase in number of proposed models for image captioning task since neural language models and convolutional neural networks(CNN) became popular. Our project has its base on one of such works, which uses a variant of Recurrent neural network coupled with a CNN. We intend to enhance t...

متن کامل

Hierarchical Matching of Cortical Features for Deformable Brain Image Registration

This paper builds upon our previous work on elastic registration, using surface-to-surface mapping. In particular, a methodology for finding a smooth map from one cortical surface to another is presented, using constraints imposed by a number of sulcal and gyral curves. The outer cortical surface is represented by a map from the unit sphere to the surface which is obtained by a deformable surfa...

متن کامل

Vehicle Logo Recognition Using Image Matching and Textural Features

In recent years, automatic recognition of vehicle logos has become one of the important issues in modern cities. This is due to the unlimited increase of cars and transportation systems that make it impossible to be fully managed and monitored by human. In this research, an automatic real-time logo recognition system for moving cars is introduced based on histogram manipulation. In the proposed...

متن کامل

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

The existing image captioning approaches typically train a one-stage sentence decoder, which is difficult to generate rich fine-grained descriptions. On the other hand, multi-stage image caption model is hard to train due to the vanishing gradient problem. In this paper, we propose a coarse-to-fine multistage prediction framework for image captioning, composed of multiple decoders each of which...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Computer Vision

سال: 2022

ISSN: ['0920-5691', '1573-1405']

DOI: https://doi.org/10.1007/s11263-022-01624-6