Skeleton-based human action recognition has achieved a great interest in recent years, as skeleton data been demonstrated to be robust illumination changes, body scales, dynamic camera views, and complex background. Nevertheless, an effective encoding of the latent information underlying 3D is still open problem. In this work, we propose novel Spatial-Temporal Transformer network (ST-TR) which ...