Multi-Modal Question-Answering: Questions without Keyboards

نویسنده

  • Gary Kacmarcik
چکیده

This paper describes our work to allow players in a virtual world to pose questions without relying on textual input. Our approach is to create enhanced virtual photographs by annotating them with semantic information from the 3D environment’s scene graph. The player can then use these annotated photos to interact with inhabitants of the world through automatically generated queries that are guaranteed to be relevant, grammatical and unambiguous. While the range of queries is more limited than a text input system would permit, in the gaming environment that we are exploring these limitations are offset by the practical concerns that make text input inappropriate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks

Visual Question Answering (VQA) has attracted much attention since it offers insight into the relationships between the multi-modal analysis of images and natural language. Most of the current algorithms are incapable of answering open-domain questions that require to perform reasoning beyond the image contents. To address this issue, we propose a novel framework which endows the model capabili...

متن کامل

FashionAsk: A Multimedia based Question-Answering System

We demonstrate a multimedia-based question-answering system, named FashionAsk, by allowing users to ask questions referring to pictures snapped by mobile devices. Instead of asking verbose questions to depict visual instances, direct pictures are provided as part of the question. The significance of our system is that (1) our system is fully automatic such that no human expert is involved; (2) ...

متن کامل

Survey of Recent Advances in Visual Question Answering

Visual Question Answering (VQA) presents a unique challenge as it requires the ability to understand and encode the multi-modal inputs in terms of image processing and natural language processing. The algorithm further needs to learn how to perform reasoning over this multi-modal representation so it can answer the questions correctly. This paper presents a survey of different approaches propos...

متن کامل

Vidiam: Corpus-based Development of a Dialogue Manager for Multimodal Question Answering

In this chapter we describe the Vidiam project, which concerns the development of a dialogue management system for multi-modal question answering dialogues as it was carried out in the IMIX project. The approach that was followed is data-driven, that is, corpus-based. Since research in Question Answering Dialog for multi-modal information retrieval is still new, no suitable corpora were availab...

متن کامل

Image-Question-Linguistic Co-Attention for Visual Question Answering

Our project focuses on VQA: Visual Question Answering [1], specifically, answering multiple choice questions about a given image. We start by building MultiLayer Perceptron (MLP) model with question-grouped training and softmax loss. GloVe embedding and ResNet image features are used. We are able to achieve near state-of-the-art accuracy with this model. Then we add image-question coattention [...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005