The temporal action segmentation task segments videos temporally and predicts labels for all frames. Fully supervising such a model requires dense frame-wise annotations, which are expensive tedious to collect. This work is the first propose Constituent Action Discovery (CAD) framework that only video-wise high-level complex activity label as supervision segmentation. proposed approach automati...