Abstract To alleviate the practical constraints for deploying deep neural networks (DNNs) on edge devices, quantization is widely regarded as one promising technique. It reduces resource requirements computational power and storage space by quantizing weights and/or activation tensors of a DNN into lower bit-width fixed-point numbers, resulting in quantized (QNNs). While it has been empirically...