Multi-Modal Question-Answering: Questions without Keyboards
نویسنده
چکیده
This paper describes our work to allow players in a virtual world to pose questions without relying on textual input. Our approach is to create enhanced virtual photographs by annotating them with semantic information from the 3D environment’s scene graph. The player can then use these annotated photos to interact with inhabitants of the world through automatically generated queries that are guaranteed to be relevant, grammatical and unambiguous. While the range of queries is more limited than a text input system would permit, in the gaming environment that we are exploring these limitations are offset by the practical concerns that make text input inappropriate.
منابع مشابه
Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks
Visual Question Answering (VQA) has attracted much attention since it offers insight into the relationships between the multi-modal analysis of images and natural language. Most of the current algorithms are incapable of answering open-domain questions that require to perform reasoning beyond the image contents. To address this issue, we propose a novel framework which endows the model capabili...
متن کاملFashionAsk: A Multimedia based Question-Answering System
We demonstrate a multimedia-based question-answering system, named FashionAsk, by allowing users to ask questions referring to pictures snapped by mobile devices. Instead of asking verbose questions to depict visual instances, direct pictures are provided as part of the question. The significance of our system is that (1) our system is fully automatic such that no human expert is involved; (2) ...
متن کاملSurvey of Recent Advances in Visual Question Answering
Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs in terms of image processing and natural language processing. The algorithm further needs to learn how to perform reasoning over this multi-modal representation so it can answer the questions correctly. This paper presents a survey of different approaches propos...
متن کاملVidiam: Corpus-based Development of a Dialogue Manager for Multimodal Question Answering
In this chapter we describe the Vidiam project, which concerns the development of a dialogue management system for multi-modal question answering dialogues as it was carried out in the IMIX project. The approach that was followed is data-driven, that is, corpus-based. Since research in Question Answering Dialog for multi-modal information retrieval is still new, no suitable corpora were availab...
متن کاملImage-Question-Linguistic Co-Attention for Visual Question Answering
Our project focuses on VQA: Visual Question Answering [1], specifically, answering multiple choice questions about a given image. We start by building MultiLayer Perceptron (MLP) model with question-grouped training and softmax loss. GloVe embedding and ResNet image features are used. We are able to achieve near state-of-the-art accuracy with this model. Then we add image-question coattention [...
متن کامل