Synthesizing Scenes for Instance Detection
نویسندگان
چکیده
Object detection models have made significant progress in recent years. A major impediment in rapidly deploying thesemodels for instance detection is the lack of large annotated datasets. For example, finding a large labeled dataset containing instances in a particular kitchen is unlikely. The brute force data collection approach would require a lot of manual effort for each new environment with new instances. In this thesis, we explore three methods to tackle the above problem. First, we present how we can use object tracking in videos to propagate bounding box annotations from one frame to the subsequent frames. Next, we showhow 3D reconstruction can be used to produce annotations for object detection and pose estimation. Finally, we present a novel approach for generating synthetic scenes with annotations for instance detection. Our key insight is that ensuring only patch-level realism provides enough training signal for current object detector models. A naive way to do this results in pixel artifacts which result in poor performance for trained models. We show how to make detectors ignore these artifacts during training and generate data that gives competitive performance to real data. Our results show that we outperform existing synthesis approaches and that the complementary information contained in our synthetic data when combined with real data improves performance by more than 10 AP points on benchmark datasets.
منابع مشابه
Synthesizing Training Data for Object Detection in Indoor Scenes
Detection of objects in cluttered indoor environments is one of the key enabling functionalities for service robots. The best performing object detection approaches in computer vision exploit deep Convolutional Neural Networks (CNN) to simultaneously detect and categorize the objects of interest in cluttered scenes. Training of such models typically requires large amounts of annotated training ...
متن کاملOnline multiple people tracking-by-detection in crowded scenes
Multiple people detection and tracking is a challenging task in real-world crowded scenes. In this paper, we have presented an online multiple people tracking-by-detection approach with a single camera. We have detected objects with deformable part models and a visual background extractor. In the tracking phase we have used a combination of support vector machine (SVM) person-specific classifie...
متن کاملReal-time video event detection in crowded scenes using MPEG derived features: A multiple instance learning approach
This paper presents an investigation into event detection in crowded scenes, where the event of interest co-occurs with other activities and only binary labels at the clip level are available. The proposed approach incorporates a fast feature descriptor from the MPEG domain, and a novel multiple instance learning (MIL) algorithm using sparse approximation and random sensing. MPEG motion vectors...
متن کاملHuman Detection and Tracking Using Particle Filters
This is a novel approach for people detection and tracking in particle filtering framework. This algorithm uses people detectors and online trained, instance-specific classifiers as graded observation model. This algorithm robustly tracks moving people in complex scenes. This algorithm does not rely on background modeling and operates entirely in 2D.
متن کاملA Hierarchical Visual Saliency Model for Character Detection in Natural Scenes
Visual saliency models have been introduced to the field of character recognition for detecting characters in natural scenes. Researchers believe that characters have different visual properties from their non-character neighbors, which make them salient. With this assumption, characters should response well to computational models of visual saliency. However in some situations, characters belo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017