Natural language understanding (NLU) models often rely on dataset biases rather than intended task-relevant features to achieve high performance specific datasets. As a result, these perform poorly datasets outside the training distribution. Some recent studies address this issue by reducing weights of biased samples during process. However, methods still encode latent in representations and ne...