Multi-Modality Global Fusion Attention Network for Visual Question Answering

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dual Attention Network for Visual Question Answering

Visual Question Answering (VQA) is a popular research problem that involves inferring answers to natural language questions about a given visual scene. Recent neural network approaches to VQA use attention to select relevant image features based on the question. In this paper, we propose a novel Dual Attention Network (DAN) that not only attends to image features, but also to question features....

متن کامل

A Question-Focused Multi-Factor Attention Network for Question Answering

Neural network models recently proposed for question answering (QA) primarily focus on capturing the passagequestion relation. However, they have minimal capability to link relevant facts distributed across multiple sentences which is crucial in achieving deeper understanding, such as performing multi-sentence reasoning, co-reference resolution, etc. They also do not explicitly focus on the que...

متن کامل

Differential Attention for Visual Question Answering

In this paper we aim to answer questions based on images when provided with a dataset of question-answer pairs for a number of images during training. A number of methods have focused on solving this problem by using image based attention. This is done by focusing on a specific part of the image while answering the question. Humans also do so when solving this problem. However, the regions that...

متن کامل

Image-Question-Linguistic Co-Attention for Visual Question Answering

Our project focuses on VQA: Visual Question Answering [1], specifically, answering multiple choice questions about a given image. We start by building MultiLayer Perceptron (MLP) model with question-grouped training and softmax loss. GloVe embedding and ResNet image features are used. We are able to achieve near state-of-the-art accuracy with this model. Then we add image-question coattention [...

متن کامل

Hierarchical Question-Image Co-Attention for Visual Question Answering

A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the question. In this paper, we argue that in addition to modeling “where to look” or visual attention, it is equally important to model “what words to listen to” or question attention. We present a novel co-attention model for V...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Electronics

سال: 2020

ISSN: 2079-9292

DOI: 10.3390/electronics9111882