CS229 Final Project: Language Grounding in Minecraft with Gated-Attention Networks
نویسنده
چکیده
A key question in language understanding is the problem of language grounding – how do symbols such as words get their meaning? We examine this question in the context of task oriented language grounding in gameplay. In order to perform tasks and challenges specified by natural language instructions, agents need to extract semantically meaningful representations of language and map it to the visual elements of their scene and into actions in the environment. This is often referred to as task-oriented language grounding. In this project, we propose to directly map raw visual observations and text input into actions for instruction execution, using an end-to-end trainable neural architecture. The model synthesizes image and text representations using Gated-Attention mechanisms and learns a policy using Stein Variational policy gradients to execute the natural language instruction. We evaluate our method in the Minecraft environment to the problem of retrieving items in rooms and mazes and show improvements over supervised and common reinforcement learning algorithms.
منابع مشابه
Gated-Attention Architectures for Task-Oriented Language Grounding
To perform tasks specified by natural language instructions, autonomous agents need to extract semantically meaningful representations of language and map it to visual elements and actions in the environment. This problem is called taskoriented language grounding. We propose an end-to-end trainable neural architecture for task-oriented language grounding in 3D environments which assumes no prio...
متن کاملCS229 Final Project Sentiment Analysis of Tweets: Baselines and Neural Network Models
The goal of sentiment analysis is to classify text samples according to their overall positivity or negativity. We refer to the positivity or negativity of a text sample as its polarity. In this project, we investigate three-class sentiment classification of Twitter data where the labels are “positive”, “negative”, and “neutral”. We explore a number of questions in relation to the sentiment ana...
متن کاملSelf-view Grounding Given a Narrated 360° Video
Narrated 360◦ videos are typically provided in many touring scenarios to mimic real-world experience. However, previous work has shown that smart assistance (i.e., providing visual guidance) can significantly help users to follow the Normal Field of View (NFoV) corresponding to the narrative. In this project, we aim at automatically grounding the NFoVs of a 360◦ video given subtitles of the nar...
متن کاملDetecting Temporal Relations of Events in Short Narratives CS229 Fall 2016 Final Project Report
Event detection and temporal classification has long been a fundamental goal in NLP. Recent studies have highlighted the challenges that modern approaches to this task face, particularly when addressing both detection and classification together. Here, we use a new annotated corpus, StoryCloze, to train classifiers capable of classifying relationships spanning the length of a short narrative. O...
متن کاملThe Necessity of Separating Control and Logic When Grounding Language using Neuroevolution
In this research we analyze the task of evolving a neural network to understand simple English commands. By understand we mean that the final agent will perform tasks and interact with objects in its world as instructed by the experimenter. The lexicon and grammar are kept small in this experiment. This type of work where semantics are based on an agent’s perceptions and actions is referred to ...
متن کامل