Evaluating Lookup-Based Monocular Human Pose Tracking on the HumanEva Test Data
نویسنده
چکیده
This work presents an evaluation of several lookup-based methods for recovering three-dimensional human pose from monocular video sequences. The methods themselves are largely described elsewhere [1, 2], although the work presented here incorporates a few minor enhancements. The primary contribution of this work is the evaluation of the results on a data set with ground truth available, which allows for quantitative comparisons with other techniques. Methods relying upon silhouettes produced via background subtraction tend to act as a “straw man” in relation to the current state of the art; many recently proposed techniques work without reliance upon background subtraction and cite this feature as one of their advantages. Without disputing such reasonable claims, this work seeks to push the envelope for background-subtraction methods as far as possible. The goal of this effort is to provide a challenging baseline against which the performance of various alternatives may be assessed. 1 Background Subtraction Given the commitment to background subtraction made here, the quality of the extracted silhouettes will strongly affect the result. Although the reconstruction methods may cope to some extent with noisy silhouettes, for the strongest comparison the silhouettes should be nearly error-free. Fortunately, recent work has demonstrated that graph-based techniques can extract high-quality silhouettes both reliably and quickly in most cases [5]. The foreground segmentation adopted here is based upon a different implementation with a similar philosophy, as detailed in the items below. • The trim mean gives a robust Gaussian model of the background color at each pixel. For clips where the subject moves around sufficiently, the background model can be estimated directly from the action video. • For a given frame, the number of standard deviations from the background color model at each pixel guides the foreground segmentation. For color images, separate models are developed for hue, saturation, and value at each pixel. A linear combination of the results of the three models weights each one according to its reliability. (The hue channel shows greater noise even after normalizing by the standard deviation, and is consequently weighted less than the other two.) • To mitigate shadows, the model forgives luminosity decreases of up to τs from the computed background luminosity. This accounts for possible darkening due to shadows, and typically improves the segmentation where the subject’s feet meet the floor. Occasionally, it may improperly label foreground regions as background if their color is slightly darker than the background region they occlude. Figure 1: Example of typical foreground segmentation result. The precise boundaries and separation of body parts make further pose recovery steps easier. Nevertheless, errors can occur where there is poor contrast between the subject and the background, as in the darker portion of the left shoe. • The minimum cut on a graph constructed from the image gives the foreground segmentation. Both fourand eight-connected neighbor edges are included in the graph, with weaker links to diagonal neighbors so that the solution favors neither straight nor diagonal boundaries. The graph omits neighbor edges where gradients appear in the frame that are not present in the background image. This strategy biases the foreground segmentation to follow object boundaries in the image. Figure 1 shows an example of a segmented frame.
منابع مشابه
Recognition-Based Motion Capture and the HumanEva II Test Data
Quantitative comparison of algorithms for human motion capture have been hindered by the lack of standard benchmarks. The development of the HumanEva I & II test sets provides an opportunity to assess the state of the art by evaluating existing methods on the new standardized test videos. This paper presents a comprehensive evaluation of a monocular recognition-based pose recovery algorithm on ...
متن کاملEvaluating Recognition-Based Motion Capture on HumanEva II Test Data
The advent of the HumanEva standardized motion capture data sets has enabled quantitative evaluation of motion capture algorithms on comparable terms. This paper measures the performance of an existing monocular recognition-based pose recovery algorithm on select HumanEva data, including all the HumanEva II clips. The method uses a physically-motivated Markov process to connect adajacent frames...
متن کاملRegion Based 3-D Pose Tracking with Occlusions
Despite great progress achieved in 3-D pose tracking during the past years, occlusions and self-occlusions are still an open issue. This is particularly true in silhouette-based tracking where even visible parts cannot be tracked as long as they do not affect the object silhouette. Multiple cameras or motion priors can overcome this problem. However, multiple cameras or appropriate training dat...
متن کاملEvaluating Example-based Pose Estimation: Experiments on the HumanEva Sets∗
We present an example-based approach to pose recovery, using histograms of oriented gradients as image descriptors. Tests on the HumanEva-I and HumanEva-II data sets provide us insight into the strengths and limitations of an example-based approach. We report mean relative 3D errors of approximately 65 mm per joint on HumanEva-I, and 175 mm on HumanEva-II. We discuss our results using single an...
متن کاملSilhouette lookup for monocular 3D pose tracking
Computers should be able to detect and track the articulated 3-D pose of a human being moving through a video sequence. Incremental tracking methods often prove slow and unreliable, and many must be initialized by a human operator before they can track a sequence. This paper describes a simple yet effective algorithm for tracking articulated pose, based upon looking up observations (such as bod...
متن کامل