Video object detection is challenging in the presence of appearance deterioration certain video frames. Therefore, it a natural choice to aggregate temporal information from other frames same into current frame. However, ROI Align, as one most core procedures detectors, still remains extracting features single-frame feature map for proposals, making extracted lack videos. In this work, consider...