Rethinking Data Augmentation for Robust Visual Question Answering

نویسندگان

چکیده

AbstractData Augmentation (DA) — generating extra training samples beyond the original set has been widely-used in today’s unbiased VQA models to mitigate language biases. Current mainstream DA strategies are synthetic-based methods, which synthesize new by either editing some visual regions/words, or re-generating them from scratch. However, these synthetic always unnatural and error-prone. To avoid this issue, a recent work composes augmented randomly pairing pristine images other human-written questions. Unfortunately, guarantee have reasonable ground-truth answers, they manually design of heuristic rules for several question types, extremely limits its generalization abilities. end, we propose Knowledge Distillation based Data VQA, dubbed KDDAug. Specifically, first relax requirements image-question pairs, can be easily applied any type. Then, knowledge distillation (KD) answer assignment generate pseudo answers all composed robust both in-domain out-of-distribution settings. Since KDDAug is model-agnostic strategy, it seamlessly incorporated into architecture. Extensive ablation studies on multiple backbones benchmarks demonstrated effectiveness abilities KDDAug.KeywordsVQAData augmentationKnowledge

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Augmentation for Visual Question Answering

Data augmentation is widely used to train deep neural networks for image classification tasks. Simply flipping images can help learning by increasing the number of training images by a factor of two. However, data augmentation in natural language processing is much less studied. Here, we describe two methods for data augmentation for Visual Question Answering (VQA). The first uses existing sema...

متن کامل

Robust Question Answering

A Question Answering (QA) system should provide a short and precise answer to a question in natural language, by searching a large knowledge base consisting of natural language text. The sources of the knowledge base are widely available, for written natural language text is a preferential form of human communication. The information ranges from the more traditional edited texts, for example en...

متن کامل

An Exploration of Data Augmentation and RNN Architectures for Question Ranking in Community Question Answering

The automation of tasks in community question answering (cQA) is dominated by machine learning approaches, whose performance is often limited by the number of training examples. Starting from a neural sequence learning approach with attention, we explore the impact of two data augmentation techniques on question ranking performance: a method that swaps reference questions with their paraphrases...

متن کامل

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

Revisiting Visual Question Answering Baselines

Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support “reasoning”. For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-20059-5_6