Guess where? Actor-supervision for spatiotemporal action localization
نویسندگان
چکیده
منابع مشابه
Guess Where? Actor-Supervision for Spatiotemporal Action Localization
This paper addresses the problem of spatiotemporal localization of actions in videos. Compared to leading approaches, which all learn to localize based on carefully annotated boxes on training video frames, we adhere to a weakly-supervised solution that only requires a video class label. We introduce an actor-supervised architecture that exploits the inherent compositionality of actions in term...
متن کاملActor-independent action search using spatiotemporal vocabulary with appearance hashing
Human actions in movies and sitcoms usually capture semantic cues for story understanding, which offer a novel search pattern beyond the traditional video search scenario. However, there are great challenges to achieve action-level video search, such as global motions, concurrent actions, and actor appearance variances. In this paper, we introduce a generalized action retrieval framework, which...
متن کاملThe multi-item localization (MILO) task: measuring the spatiotemporal context of vision for action.
We describe a new multi-item localization task that can be used to probe the temporal and spatial contexts of search-like behaviors. A sequence of four target letters (e.g., E, F, G, and H) was presented among four distractor letters. Observers located the targets in order. Both retrospective and prospective components of performance were examined. The retrospective component was assessed by ha...
متن کاملThe multi-item localization (MILO) task: Measuring the spatiotemporal context of vision for action
This article introduces a new task for exploring the sequential selection of multiple target items during searchlike behavior. This multi-item localization (MILO) task differs in a number of respects from traditional visual search paradigms and, in particular, places a strong emphasis on the temporal, as well as the spatial, aspects of behavior. We will begin by describing the novel features of...
متن کاملSpatiotemporal Residual Networks for Video Action Recognition
Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, we introduce spatiotemporal ResNets as a combination of these two approaches. Our novel architecture generalizes ResNets for the spatiotemporal domain by intro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computer Vision and Image Understanding
سال: 2020
ISSN: 1077-3142
DOI: 10.1016/j.cviu.2019.102886