Adversarial Logit Pairing

نویسندگان

Harini Kannan

Alexey Kurakin

Ian Goodfellow

چکیده

In this paper, we develop improved techniques for defending against adversarial examples at scale. First, we implement the state of the art version of adversarial training at unprecedented scale on ImageNet and investigate whether it remains effective in this setting—an important open scientific question (Athalye et al., 2018). Next, we introduce enhanced defenses using a technique we call logit pairing, a method that encourages logits for pairs of examples to be similar. When applied to clean examples and their adversarial counterparts, logit pairing improves accuracy on adversarial examples over vanilla adversarial training; we also find that logit pairing on clean examples only is competitive with adversarial training in terms of accuracy on two datasets. Finally, we show that adversarial logit pairing achieves the state of the art defense on Imagenet against PGD white box attacks, with an accuracy improvement from 1.5% to 27.9%. Adversarial logit pairing also successfully damages the current state of the art defense against black box attacks on Imagenet (Tramèr et al., 2018), dropping its accuracy from 66.6% to 47.1%. With this new accuracy drop, adversarial logit pairing ties with Tramèr et al. (2018) for the state of the art on black box attacks on ImageNet.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Wasserstein Distributional Robustness and Regularization in Statistical Learning

A central question in statistical learning is to design algorithms that not only perform well on training data, but also generalize to new and unseen data. In this paper, we tackle this question by formulating a distributionally robust stochastic optimization (DRSO) problem, which seeks a solution that minimizes the worstcase expected loss over a family of distributions that are close to the em...

متن کامل

Intriguing Properties of Adversarial Examples

It is becoming increasingly clear that many machine learning classifiers are vulnerable to adversarial examples. In attempting to explain the origin of adversarial examples, previous studies have typically focused on the fact that neural networks operate on high dimensional data, they overfit, or they are too linear. Here we argue that the origin of adversarial examples is primarily due to an i...

متن کامل

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks

While humans easily recognize relations between data from different domains without any supervision, learning to automatically discover them is in general very challenging and needs many ground-truth pairs that illustrate the relations. To avoid costly pairing, we address the task of discovering cross-domain relations given unpaired data. We propose a method based on generative adversarial netw...

متن کامل

A Learning and Masking Approach to Secure Learning

Deep Neural Networks (DNNs) have been shown to be vulnerable against adversarial examples, which are data points cleverly constructed to fool the classifier. Such attacks can be devastating in practice, especially as DNNs are being applied to ever increasing critical tasks like image recognition in autonomous driving. In this paper, we introduce a new perspective on the problem. We do so by fir...

متن کامل

Revisiting Classifier Two-sample Tests

The goal of two-sample tests is to assess whether two samples, SP ∼ P and SQ ∼ Q, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary classifiers. In particular, construct a dataset by pairing the n examples in SP with a positive label, and by pairing the m examples in SQ with a negative label. If the null h...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Adversarial Logit Pairing

نویسندگان

چکیده

منابع مشابه

Wasserstein Distributional Robustness and Regularization in Statistical Learning

Intriguing Properties of Adversarial Examples

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks

A Learning and Masking Approach to Secure Learning

Revisiting Classifier Two-sample Tests

عنوان ژورنال:

اشتراک گذاری