Most existing video tasks related to “human” focus on the segmentation of salient humans, ignoring unspecified others in video. Few studies have focused segmenting and tracking all humans a complex video, including pedestrians other states (e.g., seated, riding, or occluded). In this paper, we propose novel framework, abbreviated as HVISNet, that segments tracks presented people given videos ba...