An automatic speech recognition (ASR) performance has greatly improved with the introduction of convolutional neural network (CNN) or long-short term memory (LSTM) for acoustic modeling. Recently, a convolutional LSTM (CLSTM) has been proposed to directly use convolution operation within the LSTM blocks and combine the advantages of both CNN and LSTM structures into a single architecture. Thi...