FVQA: Fact-based Visual Question Answering

نویسندگان

Peng Wang

Qi Wu

Chunhua Shen

Anton van den Hengel

Anthony R. Dick

چکیده

Visual Question Answering (VQA) has attracted much attention in both computer vision and natural language processing communities, not least because it offers insight into the relationships between two important sources of information. Current datasets, and the models built upon them, have focused on questions which are answerable by direct analysis of the question and image alone. The set of such questions that require no external information to answer is interesting, but very limited. It excludes questions which require common sense, or basic factual knowledge to answer, for example. Here we introduce FVQA (Fact-based VQA), a VQA dataset which requires, and supports, much deeper reasoning. FVQA primarily contains questions that require external information to answer. We thus extend a conventional visual question answering dataset, which contains image-question-answer triplets, through additional image-question-answer-supporting fact tuples. Each supporting-fact is represented as a structural triplet, such as .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Memory Networks for Visual and Textual Question Answering

Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during train...

متن کامل

Explicit Knowledge-based Reasoning for Visual Question Answering

We describe a method for visual question answering which is capable of reasoning about contents of an image on the basis of information extracted from a large-scale knowledge base. The method not only answers natural language questions using concepts not contained in the image, but can provide an explanation of the reasoning by which it developed its answer. The method is capable of answering f...

متن کامل

ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

We propose a novel attention based deep learning architecture for visual question answering task (VQA). Given an image and an image-related question, VQA returns a natural language answer. Since different questions inquire about the attributes of different image regions, generating correct answers requires the model to have questionguided attention, i.e., the attention on the regions correspond...

متن کامل

Learning Convolutional Text Representations for Visual Question Answering

Visual question answering is a recently proposed articial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object ...

متن کامل

Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering

Recently, the Visual Question Answering (VQA) task has gained increasing attention in artificial intelligence. Existing VQA methods mainly adopt the visual attention mechanism to associate the input question with corresponding image regions for effective question answering. The freeform region based and the detection-based visual attention mechanisms are mostly investigated, with the former one...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

IEEE transactions on pattern analysis and machine intelligence

دوره شماره

صفحات -

تاریخ انتشار 2017

FVQA: Fact-based Visual Question Answering

نویسندگان

چکیده

منابع مشابه

Dynamic Memory Networks for Visual and Textual Question Answering

Explicit Knowledge-based Reasoning for Visual Question Answering

ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering

Learning Convolutional Text Representations for Visual Question Answering

Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering

عنوان ژورنال:

اشتراک گذاری