Learning to generalize to new compositions in image understanding
نویسندگان
چکیده
Recurrent neural networks have recently been used for learning to describe images using natural language. However, it has been observed that these models generalize poorly to scenes that were not observed during training, possibly depending too strongly on the statistics of the text in the training data. Here we propose to describe images using short structured representations, aiming to capture the crux of a description. These structured representations allow us to tease-out and evaluate separately two types of generalization: standard generalization to new images with similar scenes, and generalization to new combinations of known entities. We compare two learning approaches on the MS-COCO dataset: a state-of-the-art recurrent network based on an LSTM (Show, Attend and Tell), and a simple structured prediction model on top of a deep network. We find that the structured model generalizes to new compositions substantially better than the LSTM, ∼7 times the accuracy of predicting structured representations. By providing a concrete method to quantify generalization for unseen combinations, we argue that structured representations and compositional splits are a useful benchmark for image captioning, and advocate compositional models that capture linguistic and visual structure.
منابع مشابه
Investigate Diagnostic validity of the third edition of the new Wadkock-Johnson Cognitive Ability Scale in Learning Disabled Students in Ahvaz city
The purpose of this study was to investigate Learning disability diagnostic validation by Woodcock-Johnson III Tests of Cognitive Abilities in Ahvaz city. Statistical Society this study includes all male and female students with learning disabilities from the first to fifth grade of elementary school in Ahvaz. In the academic year 2012-2013, from the state and non-governmental centers, the indi...
متن کاملImage alignment via kernelized feature learning
Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...
متن کاملGeneralization to New Compositions of Known Entities in Image Understanding
Recurrent neural networks can be trained to describe images with natural language, but it has been observed that they generalize poorly to new scenes at test time. Here we provide an experimental framework to quantify their generalization to unseen compositions. By describing images using short structured representations, we tease apart and evaluate separately two types of generalization: (1) g...
متن کاملA Qualitative Investigation Into Conceptual Understanding at Iranian Elementary Schools
This study attempts to root out some of the causes of absence of conceptual understanding in elementary level particularly in math subject and the factors that cause this deficiency. As a body of researchers, we use a quasi-form of methodology qualitatively designed by which we give a pre-test and post-test to our participants (randomly selected teachers and students), through some open-ended q...
متن کاملCluster-Based Image Segmentation Using Fuzzy Markov Random Field
Image segmentation is an important task in image processing and computer vision which attract many researchers attention. There are a couple of information sets pixels in an image: statistical and structural information which refer to the feature value of pixel data and local correlation of pixel data, respectively. Markov random field (MRF) is a tool for modeling statistical and structural inf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1608.07639 شماره
صفحات -
تاریخ انتشار 2016