Due to the over-parameterization of neural networks, many model compression methods based on pruning and quantization have emerged. They are remarkable in reducing size, parameter number, computational complexity model. However, most models compressed by such need support special hardware software, which increases deployment cost. Moreover, these mainly used classification tasks, rarely directl...