Describing Common Human Visual Actions in Images

نویسندگان

Matteo Ruggero Ronchi

Pietro Perona

چکیده

Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common ‘visual actions’, obtained by analyzing the largest on-line verb lexicon currently available for English (VerbNet) and human sentences used to describe images in MS COCO. Second, a complete set of annotations for those ‘visual actions’, composed of subject-object and associated verb, which we call COCO-a (a for ‘actions’). COCO-a is larger than existing action datasets in terms of number of actions and instances of these actions, and is unique because it is data-driven, rather than experimenter-biased. Other unique features are that it is exhaustive, and that all subjects and objects are localized. A statistical analysis of the accuracy of our annotations and of each action, interaction and subject-object combination is provided.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RONCHI AND PERONA: DESCRIBING COMMON HUMAN VISUAL ACTIONS IN IMAGES 1 Describing Common Human Visual Actions in Images

متن کامل

The Rhetorical - Aesthetic Approach to Constructing the Relation between Images and Visual Inventions with Global Politics

Images and photos play an important role in our understanding of domestic and international events. Today we are living in the age of the visualization of politics. The images are vague, rhetorical, and aesthetic components of political and social phenomena and can give them a beautiful or detestable structure. In the digital age, images in and of themselves can define our structure and vision ...

متن کامل

Describing Images using Inferred Visual Dependency Representations

The Visual Dependency Representation (VDR) is an explicit model of the spatial relationships between objects in an image. In this paper we present an approach to training a VDR Parsing Model without the extensive human supervision used in previous work. Our approach is to find the objects mentioned in a given description using a state-of-the-art object detector, and to use successful detections...

متن کامل

Grounding Action Descriptions in Videos

Recent work has shown that the integration of visual information into text-based models can substantially improve model predictions, but so far only visual information extracted from static images has been used. In this paper, we consider the problem of grounding sentences describing actions in visual information extracted from videos. We present a general purpose corpus that aligns high qualit...

متن کامل

Reduced-Reference Image Quality Assessment based on saliency region extraction

In this paper, a novel saliency theory based RR-IQA metric is introduced. As the human visual system is sensitive to the salient region, evaluating the image quality based on the salient region could increase the accuracy of the algorithm. In order to extract the salient regions, we use blob decomposition (BD) tool as a texture component descriptor. A new method for blob decomposition is propos...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Describing Common Human Visual Actions in Images

نویسندگان

چکیده

منابع مشابه

RONCHI AND PERONA: DESCRIBING COMMON HUMAN VISUAL ACTIONS IN IMAGES 1 Describing Common Human Visual Actions in Images

The Rhetorical - Aesthetic Approach to Constructing the Relation between Images and Visual Inventions with Global Politics

Describing Images using Inferred Visual Dependency Representations

Grounding Action Descriptions in Videos

Reduced-Reference Image Quality Assessment based on saliency region extraction

عنوان ژورنال:

اشتراک گذاری