Efficient Label Contamination Attacks Against Black-Box Learning Models

نویسندگان

  • Mengchen Zhao
  • Bo An
  • Wei Gao
  • Teng Zhang
چکیده

Label contamination attack (LCA) is an important type of data poisoning attack where an attacker manipulates the labels of training data to make the learned model beneficial to him. Existing work on LCA assumes that the attacker has full knowledge of the victim learning model, whereas the victim model is usually a black-box to the attacker. In this paper, we develop a Projected Gradient Ascent (PGA) algorithm to compute LCAs on a family of empirical risk minimizations and show that an attack on one victim model can also be effective on other victim models. This makes it possible that the attacker designs an attack against a substitute model and transfers it to a black-box victim model. Based on the observation of the transferability, we develop a defense algorithm to identify the data points that are most likely to be attacked. Empirical studies show that PGA significantly outperforms existing baselines and linear learning models are better substitute models than nonlinear ones.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Decision-based Adversarial Attacks: Reliable Attacks against Black-box Machine Learning Models

Many machine learning algorithms are vulnerable to almost imperceptible perturbations of their inputs. So far it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class proba...

متن کامل

Decision-Based Adversarial Attacks: Reliable Attacks Against Black-Box Machine Learning Models

Many machine learning algorithms are vulnerable to almost imperceptible perturbations of their inputs. So far it was unclear how much risk adversarial perturbations carry for the safety of real-world machine learning applications because most methods used to generate such perturbations rely either on detailed model information (gradient-based attacks) or on confidence scores such as class proba...

متن کامل

Exploring the Space of Black-box Attacks on Deep Neural Networks

Existing black-box attacks on deep neural networks (DNNs) so far have largely focused on transferability, where an adversarial instance generated for a locally trained model can “transfer” to attack other learning models. In this paper, we propose novel Gradient Estimation black-box attacks for adversaries with query access to the target model’s class probabilities, which do not rely on transfe...

متن کامل

The best defense is a good offense: Countering black box attacks by predicting slightly wrong labels

Black-Box attacks on machine learning models occur when an attacker, despite having no access to the inner workings of a model, can successfully craft an attack by means of model theft. The attacker will train an own substitute model that mimics the model to be attacked. The substitute can then be used to design attacks against the original model, for example by means of adversarial samples. We...

متن کامل

Hardening Deep Neural Networks via Adversarial Model Cascades

Deep neural networks (DNNs) have been shown to be vulnerable to adversarial examples malicious inputs which are crafted by the adversary to induce the trained model to produce erroneous outputs. This vulnerability has inspired a lot of research on how to secure neural networks against these kinds of attacks. Although existing techniques increase the robustness of the models against whitebox att...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017