This article introduces the task of visual named entity discovery in videos without need for task-specific supervision or external knowledge sources. Assigning specific names to entities (e.g., faces, scenes, objects) video frames is a long-standing challenge. Commonly, this problem addressed as supervised learning objective by manually annotating with labels. To bypass annotation burden setup,...