Monocular depth estimation has been gaining growing momentum in recent years. Despite significant advances of this task, due to the inherent difficulty reliably capturing contextual cues from RGB images, it remains challenging accurately predict scenes with complicated and cluttered spatial arrangement objects. Instead naively utilizing primary features single image, paper we propose a hierarch...