Abstract Can our video understanding systems perceive objects when a heavy occlusion exists in scene? To answer this question, we collect large-scale dataset called OVIS for occluded instance segmentation, that is, to simultaneously detect, segment, and track instances scenes. consists of 296k high-quality masks from 25 semantic categories, where object occlusions usually occur. While human vis...