Real-time Action Recognition by Spatiotemporal Semantic and Structural Forests

نویسندگان

Tsz-Ho Yu

Tae-Kyun Kim

Roberto Cipolla

چکیده

This paper presents a novel real-time action recogniser by utilising both local appearance and structural information. Our method is able to recognise actions continuously in real-time while achieving comparably high accuracy over state-of-the-arts. Run-time speed is of vital importance in real-world action recognition systems, but existing methods seldom take computational complexity into full consideration. A class label is assigned after an entire query video is analysed, or a large lookahead is required to recognise an action. In addition, the “bag of words”(BOW) has proven effective for action recognition [5]. However, the standard BOW model ignores the spatiotemporal relationships among feature descriptors, which are useful for describing actions. Addressing these challenges, we present a novel approach for action recognition. The major contributions include the followings: Efficient Spatiotemporal Codebook Learning: We extend the use of semantic texton forests [6] (STFs) from 2D image segmentation to spatiotemporal analysis. As well as being much faster than a traditional flat codebook such as k-means clustering, STFs achieve high accuracy comparable to that of existing approaches. STFs are ensembles of random decision trees that textonise input video patches into semantic textons. Since only a small number of simple features are used to traverse the trees, STFs are extremely fast to evaluate. They also serve a powerful discriminative codebook by multiple decision trees. Figure 1 illustrates how visual codewords are generated using STFs in the proposed method. Combined Structural and Appearance Information: We propose a richer description of features, hence actions can be classified in very short video sequences. Based on [3], we introduce the pyramidal spatiotemporal relationship match (PSRM) to encapsulate both local appearance and structural information efficiently. Subsequences are sampled from an input video in short intervals (e.g. ≤ 10 frames). After spatiotemporal interest points are localised, the trained STFs assign visual codewords to the features. A set of pairwise spatiotemporal associations are designed to capture the structural relationships among features (i.e. pairwise distances along space-time axes). All possible pairs in the bag of features are analysed by the association rules and stored in the 3-D histogram. PSRM leverages the properties of semantic trees and pyramidal match kernels. Multiple pyramidal histograms are then combined to classify a query video. Figure 2 illustrates how the relationship histograms are constructed and matched using PSRM. For each tree in STFs, the threedimensional histogram is constructed according to their spatiotemporal structures (see figure 2 (left)). Its hierarchical structure offers a time efficient way to perform the pyramid match kernel [1] for codeword matching (figure 2 (right)). Enhanced Efficiency and Combined Classification: Several techniques are employed to improve the recognition speed and accuracy. A novel spatiotemporal interest point detector, called V-FAST, is designed based on the FAST 2D corners [2]. The recognition accuracy is enhanced by adaptively combining PSRM and the bag of semantic texton (BOST) method [6]: the k-means forest classifier is learned using PSRM as a matching kernel. The task of action recognition is performed separately Spatiotemporal Relationship Match of visual codewords from Semantic Texton Forest Pyramid Match Kernel is utilised to match the histograms Feature Extraction Feature Matching

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Understanding the semantic principles of a political map

The attempt to recognize phenomena and affairs has always been a concern of the human mind and has constantly sought to complete this knowledge. The correct recognition is also achieved when the real nature of phenomena is clear to man. The phenomena are based on their own philosophical foundations and, therefore, their understanding requires perception these philosophical foundations and using...

متن کامل

طراحی و پیاده‌سازی سامانۀ بی‌درنگ آشکارسازی و شناسایی پلاک خودرو در تصاویر ویدئویی

An automatic Number Plate Recognition (ANPR) is a popular topic in the field of image processing and is considered from different aspects, since early 90s. There are many challenges in this field, including; fast moving vehicles, different viewing angles and different distances from camera, complex and unpredictable backgrounds, poor quality images, existence of multiple plates in the scene, va...

متن کامل

Developing a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity

Objective: Brain trauma evidences suggest that the two grammatical categories of noun and verb are processed in different regions of the brain due to differences in the complexity of grammatical and semantic information processing. Studies have shown that the verbs belonging to different semantic categories lead to neural activity in different areas of the brain, and action verb processing is r...

متن کامل

Feature integration with random forests for real-time human activity recognition

This paper presents an approach for real-time human activity recognition. Three different kinds of features (flow, shape, and a keypoint-based feature) are applied in activity recognition. We use random forests for feature integration and activity classification. A forest is created at each feature that performs as a weak classifier. The international classification of functioning, disability a...

متن کامل

Action Change Detection in Video Based on HOG

Background and Objectives: Action recognition, as the processes of labeling an unknown action of a query video, is a challenging problem, due to the event complexity, variations in imaging conditions, and intra- and inter-individual action-variability. A number of solutions proposed to solve action recognition problem. Many of these frameworks suppose that each video sequence includes only one ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forests

نویسندگان

چکیده

منابع مشابه

Understanding the semantic principles of a political map

طراحی و پیاده‌سازی سامانۀ بی‌درنگ آشکارسازی و شناسایی پلاک خودرو در تصاویر ویدئویی

Developing a Semantic Similarity Judgment Test for Persian Action Verbs and Non-action Nouns in Patients With Brain Injury and Determining its Content Validity

Feature integration with random forests for real-time human activity recognition

Action Change Detection in Video Based on HOG

عنوان ژورنال:

اشتراک گذاری