Semantic action recognition aims to classify actions based on the associated semantics, which can be applied in video captioning and human-machine interaction. In this paper problem is addressed by jointly learning multiple pose lexicons body parts. Specifically, visual models are learnt, one model with part, characterises likelihood of an observed frame being generated from hidden poses. Moreo...