Hierarchical Multi-scale Attention Networks for action recognition

نویسندگان

  • Shiyang Yan
  • Jeremy S. Smith
  • Wenjin Lu
  • Bailing Zhang
چکیده

Recurrent Neural Networks (RNNs) have been widely used in natural language processing and computer vision. Among them, the Hierarchical Multi-scale RNN (HM-RNN), a kind of multi-scale hierarchical RNN proposed recently, can learn the hierarchical temporal structure from data automatically. In this paper, we extend the work to solve the computer vision task of action recognition. However, in sequence-to-sequence models like RNN, it is normally very hard to discover the relationships between inputs and outputs given static inputs. As a solution, attention mechanism could be applied to extract the relevant information from input thus facilitating the modeling of inputoutput relationships. Based on these considerations, we propose a novel attention network, namely Hierarchical Multi-scale Attention Network (HM-AN), by combining the HM-RNN and the attention mechanism and apply it to action recognition. A newly proposed gradient estimation method for stochastic neurons, namely Gumbel-softmax, is exploited to implement the temporal boundary detectors and the stochastic hard attention mechanism. To amealiate the negative effect of sensitive temperature of the Gumbel-softmax, an adaptive temperature training method is applied to better the system performance. The experimental results demonstrate the improved effect of HM-AN over LSTM with attention on the vision task. Through visualization of what have been learnt by the networks, it can be observed that both the attention regions of images and the hierarchical temporal structure can be captured by HM-AN.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MLCA: A Multi-Level Clustering Algorithm for Routing in Wireless Sensor Networks

Energy constraint is the biggest challenge in wireless sensor networks because the power supply of each sensor node is a battery that is not rechargeable or replaceable due to the applications of these networks. One of the successful methods for saving energy in these networks is clustering. It has caused that cluster-based routing algorithms are successful routing algorithm for these networks....

متن کامل

Artificial vision by multi-layered neural networks: Neocognitron and its advances

The neocognitron is a neural network model proposed by Fukushima (1980). Its architecture was suggested by neurophysiological findings on the visual systems of mammals. It is a hierarchical multi-layered network. It acquires the ability to robustly recognize visual patterns through learning. Although the neocognitron has a long history, modifications of the network to improve its performance ar...

متن کامل

Incorporating ecological knowledge into ecoinformatics: An example of modeling hierarchically structured aquatic communities with neural networks

Article history: Received 15 February 2005 Received in revised form 29 July 2005 Accepted 9 August 2005 The field of ecoinformatics is concerned with gaining a greater understanding of complex ecological systems. Many ecoinformatic tools, including artificial neural networks (ANNs), can shed important insights into the complexities of ecological data through pattern recognition and prediction; ...

متن کامل

Cortex-inspired Recurrent Networks for Developmental Visual Attention and Recognition

Cortex-inspired Recurrent Networks for Developmental Visual Attention and Recognition By Matthew Luciw It is unknown how the brain self-organizes its internal wiring without a holisticallyaware central controller. How does the brain develop internal object representations for a massive number of objects? How do such representations enable tightly intertwined attention and recognition in the pre...

متن کامل

Pillar Networks for action recognition

Image understanding using deep convolutional network has reached human-level performance, yet a closely related problem of video understanding especially, action recognition has not reached the requisite level of maturity. We combine multi-kernels based support-vector-machines (SVM) with a multi-stream deep convolutional neural network to achieve close to state-of-the-art performance on a 51-cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Sig. Proc.: Image Comm.

دوره 61  شماره 

صفحات  -

تاریخ انتشار 2018