In this work, we present SeqFormer for video instance segmentation. follows the principle of vision transformer that models relationships among frames. Nevertheless, observe a stand-alone query suffices capturing time sequence instances in video, but attention mechanisms shall be done with each frame independently. To achieve this, locates an and aggregates temporal information to learn powerfu...