Sherlock: Scalable Fact Learning in Images
نویسندگان
چکیده
The human visual system is capable of learning an unbounded number of facts from images including not only objects but also their attributes, actions and interactions. Such uniform understanding of visual facts has not received enough attention. Existing visual recognition systems are typically modeled differently for each fact type such as objects, actions, and interactions. We propose a setting where all these facts can be modeled simultaneously with a capacity to understand an unbounded number of facts in a structured way. The training data comes as structured facts in images, including (1) objects (e.g., ), (2) attributes (e.g., ), (3) actions (e.g.,), and (4) interactions (e.g.,). Each fact has a language view (e.g., < boy, playing>) and a visual view (an image). We show that learning visual facts in a structured way enables not only a uniform but also generalizable visual understanding. We propose and investigate recent and strong approaches from the multiview learning literature and also introduce a structured embedding model. We applied the investigated methods on several datasets that we augmented with structured facts and a large scale dataset of > 202,000 facts and 814,000 images. Our results show the advantage of relating facts by the structure by the proposed model compared to the baselines.
منابع مشابه
Sherlock: Modeling Structured Knowledge in Images
We study scalable and uniform understanding of facts in images. Existing visual recognition systems are typically modeled differently for each fact type such as objects, actions, and interactions. We propose a setting where all these facts can be modeled simultaneously with a capacity to understand unbounded number of facts in a structured way. The training data comes as structured facts in ima...
متن کاملNAGRANI, ZISSERMAN: FROM BENEDICT CUMBERBATCH TO SHERLOCK HOLMES 1 From Benedict Cumberbatch to Sherlock Holmes: Character Identification in TV series without a Script
The goal of this paper is the automatic identification of characters in TV and feature film material. In contrast to standard approaches to this task, which rely on the weak supervision afforded by transcripts and subtitles, we propose a new method requiring only a cast list. This list is used to obtain images of actors from freely available sources on the web, providing a form of partial super...
متن کاملFrom Benedict Cumberbatch to Sherlock Holmes: Character Identification in TV series without a Script
The goal of this paper is the automatic identification of characters in TV and feature film material. In contrast to standard approaches to this task, which rely on the weak supervision afforded by transcripts and subtitles, we propose a new method requiring only a cast list. This list is used to obtain images of actors from freely available sources on the web, providing a form of partial super...
متن کاملScalable Image Annotation by Summarizing Training Samples into Labeled Prototypes
By increasing the number of images, it is essential to provide fast search methods and intelligent filtering of images. To handle images in large datasets, some relevant tags are assigned to each image to for describing its content. Automatic Image Annotation (AIA) aims to automatically assign a group of keywords to an image based on visual content of the image. AIA frameworks have two main sta...
متن کاملIntelligent scalable image watermarking robust against progressive DWT-based compression using genetic algorithms
Image watermarking refers to the process of embedding an authentication message, called watermark, into the host image to uniquely identify the ownership. In this paper a novel, intelligent, scalable, robust wavelet-based watermarking approach is proposed. The proposed approach employs a genetic algorithm to find nearly optimal positions to insert watermark. The embedding positions coded as chr...
متن کامل