This paper studies various deep learning models for word-level lip-reading technology, one of the tasks in supervised video classification. Several public datasets have been published research field. However, few investigated techniques using multiple datasets. evaluates four publicly available datasets, namely Lip Reading Wild (LRW), OuluVS, CUAVE, and Speech Scene by Smart Device (SSSD), whic...