Learning Structured Semantic Embeddings for Visual Recognition
نویسندگان
چکیده
Numerous embedding models have been recently explored to incorporate semantic knowledge into visual recognition. Existing methods typically focus on minimizing the distance between the corresponding images and texts in the embedding space but do not explicitly optimize the underlying structure. Our key observation is that modeling the pairwise image-image relationship improves the discrimination ability of the embedding model. In this paper, we propose the structured discriminative and difference constraints to learn visual-semantic embeddings. First, we exploit the discriminative constraints to capture the intraand inter-class relationships of image embeddings. The discriminative constraints encourage separability for image instances of different classes. Second, we align the difference vector between a pair of image embeddings with that of the corresponding word embeddings. The difference constraints help regularize image embeddings to preserve the semantic relationships among word embeddings. Extensive evaluations demonstrate the effectiveness of the proposed structured embeddings for single-label classification, multilabel classification, and zero-shot recognition.
منابع مشابه
VSE++: Improved Visual-Semantic Embeddings
We present a new technique for learning visual-semantic embeddings for crossmodal retrieval. Inspired by the use of hard negatives in structured prediction, and ranking loss functions used in retrieval, we introduce a simple change to common loss functions used to learn multi-modal embeddings. That, combined with fine-tuning and the use of augmented data, yields significant gains in retrieval p...
متن کاملZero-shot Recognition via Semantic Embeddings and Knowledge Graphs
We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this...
متن کاملGaussian Visual-Linguistic Embedding for Zero-Shot Recognition
An exciting outcome of research at the intersection of language and vision is that of zeroshot learning (ZSL). ZSL promises to scale visual recognition by borrowing distributed semantic models learned from linguistic corpora and turning them into visual recognition models. However the popular word-vector DSM embeddings are relatively impoverished in their expressivity as they model each word as...
متن کاملThe Effect of Using Visual Aids, Semantic Elaboration, and Visual Aids plus Semantic Elaboration on Iranian Learners' Vocabulary Learning
This study investigated the effect of using visual aids, semantic elaboration, and visual aids plus semantic elaboration on the Iranian EFL learners' vocabulary learning. To conduct the study, the researchers assigned 49 elementary learners to three homogeneous groups according to their proficiency level. Then, a pre-test of Paribakht and Wesche's Vocabulary Knowledge Scale was given to each gr...
متن کاملLatent semantic learning with structured sparse representation for human action recognition
This paper proposes a novel latent semantic learning method for extracting high-level latent semantics from a large vocabulary of abundant mid-level features (i.e. visual keywords) with structured sparse representation, which can help to bridge the semantic gap in the challenging task of human action recognition. To discover the manifold structure of mid-level features, we develop a graph-based...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1706.01237 شماره
صفحات -
تاریخ انتشار 2017