Supplementary Material: Displets: Resolving Stereo Ambiguities using Object Knowledge
نویسندگان
چکیده
In this supplementary document, we first show how disparities relate to the 3D planes used in our representation. Next, we provide an illustration of the sampled displets for a single test image and visualize the influence of the displets with respect to the associated superpixels. For reproducibility, we also list the learned parameter settings which we used throughout all our experiments and show the change in performance with respect to these parameters. Finally, we show additional qualitative results on KITTI validation set. On our project website we also provide a supplementary video demonstrating the semi-convex hull optimization process which we use in order to simplify the CAD models. 1. Plane Representation In our formulation, each superpixel is represented as a random variable ni ∈ R 3 describing a plane in 3D (ni x = 1 for x ∈ R on the plane). We denote the disparity of plane ni at pixel p = (u, v) T by ω(ni,p). In the following, we show how ω(ni,p) can be derived for a (rectified) pinhole stereo camera with known intrinsics and extrinsics: u = fx z + cu, x = (u− cu) z f v = fy z + cv, y = (v − cv) z f d = fL z 1 = xnx + yny + znz 1 = (u− cu) z f nx + (v − cv) z f ny + znz 1 z = (u− cu) nx f + (v − cv) ny f + nz d = (u− cu)Lnx + (v − cv)Lny + fLnz d = Lnxu+ Lnyv + fLnz − cuLnx − cvLny (1) d = au+ bv + c (2) Here, f, cu, cv denote the intrinsic camera calibration parameters and L is the baseline. 2. Illustration of Displets Fig. 1 visualizes a subset of the sampled displets for a single test image which serve as input to our CRF. Wrong displet proposals are eliminated during inference if they don’t agree with the observations or overlap with over displets. http://www.cvlibs.net/projects/displets/ 1 978-1-4673-6964-0/15/$31.00 ©2015 IEEE Figure 1: Illustration of Displets. This figure shows a subset of sampled displets for an image by overlaying the corresponding disparity maps on top of the image regions they have been proposed for. 3. Displet Parameters The influence of a displet on its associated superpixels is determined by a penalty function λki. We take λki as a weighted sigmoid function of the distance transform λki = λθ 1 1 + exp(λa − λb DTki) where DTki denotes the Euclidean distance transform of superpixel i with respect to the boundary of displet k. The model parameters λθ, λa and λb are learned from training data as described in the next section. In Fig. 2, we visualize λki for a couple of displets and the associated superpixels. Less transparency indicates a higher penalty, e.g., we allow more deviation at the displet boundaries. In contrast, λki takes a very large value (λθ) in the center of the displet (see Section 4), effectively approximating a hard constraint (λki = ∞). Furthermore, we take the fitness score κk as the displet log likelihood (−E Ω̂ ) in order to increase the plausibility of high quality displets. In practice, we subtract the largest fitness score of all displets originating from the same object proposal region O in order to calibrate for the scores of different regions. 4. Parameter Settings Table 1 lists the parameter values we used throughout our experiments. The parameters in the left table are obtained using block coordinate descent on 50 randomly selected images from the KITTI training set as described in the paper submission. The values of the parameters in the right table have been determined empirically. Furthermore, Fig. 3 shows the sensitivity when varying the main parameters in our model for both CNN and SGM features. For this experiment, we vary one parameter at a time while setting all other parameters to their optimal values listed in Table 1. We observe that our method is not very sensitive with respect to small parameter changes for most of the parameters. The performance in reflective regions is highly dependent on the displet parameters as the induced sigmoid function defines the contribution of the inferred displets to the individual superpixels. 2 Figure 2: Illustration of Displet Influence λki. We penalize superpixels which do not agree (ni 6= n̂k,zi ) with active displets (dk = 1). Less transparency indicates a higher penalty, e.g., we allow more deviation at the displet boundaries. Parameter Value Unary Threshold (τ1) 6.98 Pairwise Boundary Threshold (τ2) 3.40 Pairwise Normal Threshold (τ3) 0.06 Pairwise Boundary Weight (θ1) 1.50 Pairwise Normal Weight (θ2) 586.87 Displet Weight (θ3) 1.53 Occlusion Weight 1.00 Displet Consistency Weight (λθ) 6× 10 4 Displet Consistency Sigmoid (λa, λb) [7.90 0.82] Parameter Value Number of Particles 30 Number of Superpixels 1000 Number of Iterations 40 TRWS Number of Iterations 200 Particle Standard Deviations [0.5 0.5 5] Table 1: Model Parameters. Here, we show the values of the learned parameters in our model. 3 0.48 3.08 5.68 8.28 10.88 13.48 10 15 20 Unary Threshold E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.40 1.60 2.80 4.00 5.20 6.40 10 15 20 Pairwise Boundary Threshold E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.00 0.03 0.05 0.07 0.10 0.12 10 15 20 Pairwise Normal Threshold E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.01 0.60 1.21 1.80 2.40 3.00 10 15 20 Pairwise Boundary Weight E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 86.88 286.88 486.88 686.88 886.881086.88 10 15 20 Pairwise Normal Weight E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.00 0.40 0.80 1.20 1.60 2.00 10 15 20 Occlusion Weight E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.04 0.64 1.24 1.84 2.44 3.04 10 15 20 Displet Weight E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.03 0.27 0.51 0.75 0.99 1.23 10 15 20 Displets Consistency Weight (x10 4 ) E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.40 3.40 6.40 9.40 12.40 15.40 10 15 20 Displet Consistency Sigmoid (a) E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.03 0.35 0.67 0.99 1.31 1.63 10 15 20 Displet Consistency Sigmoid (b) E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 5 10 15 20 25 30 35 40 45 50 10 15 20 Number of Particles E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc (a) Reflective Regions. See text for details. 0.48 3.08 5.68 8.28 10.88 13.48 3 4 5 6 7 Unary Threshold E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.40 1.60 2.80 4.00 5.20 6.40 3 4 5 6 7 Pairwise Boundary Threshold E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.00 0.03 0.05 0.07 0.10 0.12 3 4 5 6 7 Pairwise Normal Threshold E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.01 0.60 1.21 1.80 2.40 3.00 3 4 5 6 7 Pairwise Boundary Weight E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 86.88 286.88 486.88 686.88 886.881086.88 3 4 5 6 7 Pairwise Normal Weight E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.00 0.40 0.80 1.20 1.60 2.00 3 4 5 6 7 Occlusion Weight E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.04 0.64 1.24 1.84 2.44 3.04 3 4 5 6 7 Displet Weight E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.03 0.27 0.51 0.75 0.99 1.23 3 4 5 6 7 Displets Consistency Weight (x10 4 ) E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.40 3.40 6.40 9.40 12.40 15.40 3 4 5 6 7 Displet Consistency Sigmoid (a) E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 0.03 0.35 0.67 0.99 1.31 1.63 3 4 5 6 7 Displet Consistency Sigmoid (b) E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc 5 10 15 20 25 30 35 40 45 50 3 4 5 6 7 Number of Particles E rr o r (% ) CNN−Out−All CNN−Out−Noc SGM−Out−All SGM−Out−Noc (b) All Regions. See text for details. Figure 3: Sensitivity to Parameters. This figure shows the change in performance while varying one parameter at a time and keeping all others fixed at their optimal values. Errors are with respect to an error threshold of 3 pixels.
منابع مشابه
Istituto Elettrotecnico Nazionale \galileo Ferraris" A. Guiducci and A. Cumani Resolving Occlusions and Ambiguities in Stereo Images of Elongated Objects Resolving Occlusions and Ambiguities in Stereo Images of Elongated Objects
Automatic stereo vision can be quite useful in many industrial applications requiring the accurate measurement of the 3D shape of the observed objects. In this context, a serious problem is that of visual occlusion among di erent objects. In this work we consider a restricted class of allowable objects, namely nearly cylindrical objects with the axis non-parallel to the stereo baseline. We show...
متن کاملBayesian non-parametrics for multi-modal segmentation
Segmentation is a fundamental and core problem in computer vision research which has applications in many tasks, such as object recognition, content-based image retrieval, and semantic labelling. To partition the data into groups coherent in one or more characteristics such as semantic classes, is often a first step towards understanding the content of data. As information in the real world is ...
متن کاملResolving Relative Clause Attachment Ambiguities using Machine Learning Techniques and WordNet Hierarchies
We describe an approach for resolving relative clause attachment ambiguities using machine learning techniques and WordNet hierarchies. When the relative clause is preceded by the pattern NP1 Prep NP2, a decision has to be made between local and wide attachment. We train a network to make this decision and conclude that knowledge sources like WordNet have a constructive role to play in making s...
متن کاملBeyond Multi-view Stereo: Shading-Reflectance Decomposition
We introduce a variational framework for separating shading and reflectance from a series of images acquired under different angles, when the geometry has already been estimated by multi-view stereo. Our formulation uses an l-TV variational framework, where a robust photometric-based data term enforces adequation to the images, total variation ensures piecewise-smoothness of the reflectance, an...
متن کاملReliable Fusion of Stereo Matching and Depth Sensor for High Quality Dense Depth Maps
Depth estimation is a classical problem in computer vision, which typically relies on either a depth sensor or stereo matching alone. The depth sensor provides real-time estimates in repetitive and textureless regions where stereo matching is not effective. However, stereo matching can obtain more accurate results in rich texture regions and object boundaries where the depth sensor often fails....
متن کامل