Learning to generalize to new compositions in image understanding

نویسندگان

  • Yuval Atzmon
  • Jonathan Berant
  • Vahid Kezami
  • Amir Globerson
  • Gal Chechik
چکیده

Recurrent neural networks have recently been used for learning to describe images using natural language. However, it has been observed that these models generalize poorly to scenes that were not observed during training, possibly depending too strongly on the statistics of the text in the training data. Here we propose to describe images using short structured representations, aiming to capture the crux of a description. These structured representations allow us to tease-out and evaluate separately two types of generalization: standard generalization to new images with similar scenes, and generalization to new combinations of known entities. We compare two learning approaches on the MS-COCO dataset: a state-of-the-art recurrent network based on an LSTM (Show, Attend and Tell), and a simple structured prediction model on top of a deep network. We find that the structured model generalizes to new compositions substantially better than the LSTM, ∼7 times the accuracy of predicting structured representations. By providing a concrete method to quantify generalization for unseen combinations, we argue that structured representations and compositional splits are a useful benchmark for image captioning, and advocate compositional models that capture linguistic and visual structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigate Diagnostic validity of the third edition of the new Wadkock-Johnson Cognitive Ability Scale in Learning Disabled Students in Ahvaz city

The purpose of this study was to investigate Learning disability diagnostic validation by Woodcock-Johnson III Tests of Cognitive Abilities in Ahvaz city. Statistical Society this study includes all male and female students with learning disabilities from the first to fifth grade of elementary school in Ahvaz. In the academic year 2012-2013, from the state and non-governmental centers, the indi...

متن کامل

Image alignment via kernelized feature learning

Machine learning is an application of artificial intelligence that is able to automatically learn and improve from experience without being explicitly programmed. The primary assumption for most of the machine learning algorithms is that the training set (source domain) and the test set (target domain) follow from the same probability distribution. However, in most of the real-world application...

متن کامل

Generalization to New Compositions of Known Entities in Image Understanding

Recurrent neural networks can be trained to describe images with natural language, but it has been observed that they generalize poorly to new scenes at test time. Here we provide an experimental framework to quantify their generalization to unseen compositions. By describing images using short structured representations, we tease apart and evaluate separately two types of generalization: (1) g...

متن کامل

A Qualitative Investigation Into Conceptual Understanding at Iranian Elementary Schools

This study attempts to root out some of the causes of absence of conceptual understanding in elementary level particularly in math subject and the factors that cause this deficiency. As a body of researchers, we use a quasi-form of methodology qualitatively designed by which we give a pre-test and post-test to our participants (randomly selected teachers and students), through some open-ended q...

متن کامل

Cluster-Based Image Segmentation Using Fuzzy Markov Random Field

Image segmentation is an important task in image processing and computer vision which attract many researchers attention. There are a couple of information sets pixels in an image: statistical and structural information which refer to the feature value of pixel data and local correlation of pixel data, respectively. Markov random field (MRF) is a tool for modeling statistical and structural inf...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1608.07639  شماره 

صفحات  -

تاریخ انتشار 2016