Entropy guided attention network for weakly-supervised action localization

نویسندگان

چکیده

One major challenge of Weakly-supervised Temporal Action Localization (WTAL) is to handle diverse backgrounds in videos. To model background frames, most existing methods treat them as an additional action class. However, because frames usually do not share common semantics, squeezing all the different into a single class hinders network optimization. Moreover, would be confused and tends fail when tested on videos with unseen frames. address this problem, we propose Entropy Guided Attention Network (EGA-Net) out-of-domain samples. Specifically, design two-branch module, where domain branch detects whether frame by learning class-agnostic attention map, recognizes category class-specific map. By aggregating two maps joint domain-class distribution our EGA-Net can varying backgrounds. train map only video-level labels, Loss (EGL), which employs entropy supervision signal distinguish background. Global Similarity (GSL) enhance action-specific via center. Extensive experiments THUMOS14, ActivityNet1.2 ActivityNet1.3 datasets demonstrate effectiveness EGA-Net.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Attention Networks for Weakly Supervised Object Localization

We consider the problem of weakly supervised learning for object localization. Given a collection of images with image-level annotations indicating the presence/absence of an object, our goal is to localize the object in each image. We propose a neural network architecture called the attention network for this problem. Given a set of candidate regions in an image, the attention network first co...

متن کامل

Towards Weakly-Supervised Action Localization

This paper presents a novel approach for weakly-supervised action localization, i.e., that does not require per-frame spatial annotations for training. We first introduce an effective method for extracting human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Our tube extraction leverages the large amount of annotated humans available today and outper...

متن کامل

Weakly Supervised Action Localization by Sparse Temporal Pooling Network

We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video usin...

متن کامل

C-WSL: Count-guided Weakly Supervised Localization

We introduce a count-guided weakly supervised localization (C-WSL) framework with per-class object count as an additional form of image-level supervision to improve weakly supervised localization (WSL). C-WSL uses a simple count-based region selection algorithm to select highquality regions, each of which covers a single object instance at training time, and improves WSL by training with the se...

متن کامل

ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization

We aim to localize objects in images using image-level supervision only. Previous approaches to this problem mainly focus on discriminative object regions and often fail to locate precise object boundaries. We address this problem by introducing two types of context-aware guidance models, additive and contrastive models, that leverage their surrounding context regions to improve localization. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Pattern Recognition

سال: 2022

ISSN: ['1873-5142', '0031-3203']

DOI: https://doi.org/10.1016/j.patcog.2022.108718