Semi - supervised Learning of Instance - level Recognition from Video
نویسندگان
چکیده
Over the recent years, data-driven approaches that take advantage of the availability of large manually-annotated datasets have proven effective at visual perception tasks. We witnessed particularly rapid improvements in the image classification task, where machines now surpass human-level performance. However, the progress in certain instancelevel recognition tasks, such as object segmentation and human pose estimation, has not been as rapid. In part, the rate of improvement has been affected by the difficulty of manually annotating very large amounts of data for these tasks. In this work we explore if semi-supervised learning approaches that utilise unlabelled video data can provide a reasonably effective alternative to the manually annotated data for instance-level recognition. Central to our approaches is the idea that good models should consistently predict the same label for different pose and view variations of the same object instance. We employ a hard example mining heuristic to find video frames in which the model makes mistakes and correct them by combining the information from the remaining video frames. By noting that video can be seen just like a source of transformation, we generalise our approach to unlabelled images and apply it to the human pose estimation task. The resulting technique, which we call keypoint data distillation (KDD), is simple and very effective. Using a collection of unlabelled video frames, we show that the Mask R-CNN model combined with KDD achieves state-of-the-art results on the COCO Keypoint Challenge, outperforming all other entries by a significant margin.
منابع مشابه
Recognition of Visual Events using Spatio-Temporal Information of the Video Signal
Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...
متن کاملInstance-level Semisupervised Multiple Instance Learning
Multiple instance learning (MIL) is a branch of machine learning that attempts to learn information from bags of instances. Many real-world applications such as localized content-based image retrieval and text categorization can be viewed as MIL problems. In this paper, we propose a new graph-based semi-supervised learning approach for multiple instance learning. By defining an instance-level g...
متن کاملLearning Pain from Action Unit Combinations: A Weakly Supervised Approach via Multiple Instance Learning
Patient pain can be detected highly reliably from facial expressions using a set of facial muscle-based action units (AUs) defined by the Facial Action Coding System (FACS). A key characteristic of facial expression of pain is the simultaneous occurrence of pain-related AU combinations, whose automated detection would be highly beneficial for efficient and practical pain monitoring. Existing ge...
متن کاملWatch and Learn: Semi-Supervised Learning for Object Detectors From Video
We present a semi-supervised approach that localizes multiple unknown object instances in long videos. We start with a handful of labeled boxes and iteratively learn and label hundreds of thousands of object instances. We propose criteria for reliable object detection and tracking for constraining the semi-supervised learning process and minimizing semantic drift. Our approach does not assume e...
متن کاملMultiview Hessian regularized logistic regression for action recognition
With the rapid development of social media sharing, people often need to manage the growing volume of multimedia data such as large scale video classification and annotation, especially to organize those videos containing human activities. Recently, manifold regularized semi-supervised learning (SSL), which explores the intrinsic data probability distribution and then improves the generalizatio...
متن کامل