Dense video captioning aims to generate corresponding text descriptions for a series of events in the untrimmed video, which can be divided into two sub-tasks, event detection and captioning. Unlike previous works that tackle sub-tasks separately, recent have focused on enhancing inter-task association between sub-tasks. However, designing interactions is not trivial due large differences their...