Discovering Unknown Unknowns of Predictive Models

نویسندگان

  • Himabindu Lakkaraju
  • Ece Kamar
  • Eric Horvitz
چکیده

Predictive models are widely used in domains ranging from judiciary and healthcare to autonomous driving. As we increasingly rely on these models for high-stakes decisions, identifying and characterizing their unexpected failures in the real world is critical. We categorize errors of a predictive model as: known unknowns and unknown unknowns [3]. Known unknowns are those data points for which the model makes low confidence predictions and errs, whereas unknown unknowns correspond to those points where the model is highly confident about its predictions, but is actually wrong. Since the model lacks awareness of such unknown unknowns, approaches developed for addressing known unknowns (e.g., active learning) cannot be used for discovering unknown unknowns. Unknown unknowns primarily occur when the data used for training a predictive model is not representative of the samples encountered during test time, i.e., when the model is deployed in the wild. This mismatch could be a result of biases in the collection of training data or differences between the train and test distributions due to temporal, spatial or other factors such as a subtle shift in task definition.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Unknown Unknowns in the Open World: Representations and Policies for Guided Exploration

Predictive models deployed in the world may assign incorrect labels to instances with high confidence. Such errors or unknown unknowns are rooted in model incompleteness, and typically arise because of the mismatch between training data and the cases seen in the open world. As the models are blind to such errors, input from an oracle is needed to identify these failures. In this paper, we formu...

متن کامل

Beat the Machine: Challenging Workers to Find the Unknown Unknowns

We present techniques for gathering data that expose errors of automatic predictive models. In certain common settings, traditional methods for evaluating predictive models tend to miss rare-but-important errors—most importantly, rare cases for which the model is confident of its prediction (but wrong). In this paper we present a system that, in a game-like setting, asks humans to identify case...

متن کامل

Development of Database Assisted Structure Identification (DASI) Methods for Nontargeted Metabolomics

Metabolite structure identification remains a significant challenge in nontargeted metabolomics research. One commonly used strategy relies on searching biochemical databases using exact mass. However, this approach fails when the database does not contain the unknown metabolite (i.e., for unknown-unknowns). For these cases, constrained structure generation with combinatorial structure generato...

متن کامل

Standard Addition Connected to Selective Zone Discovering for Quantification in the Unknown Mixtures

Univariate calibration method is a simple, cheap and easy to use procedure in analytical chemistry. A univariate analysis will be successful if a selective signal can be found for the analyte(s). In this work, two simple ways were used to find the selective signals, spectral ratio plot (SRP) and loading plot (LP). Both of them were able to discover the selective regions in the recorded data set...

متن کامل

مقایسه قدرت پیش بینی شبکه عصبی مصنوعی با رگرسیون لجستیک چندگانه در تفکیک بیماران دیابتی رتینوپاتی از غیر رتینوپاتی

 Background: Diabetes mellitus is a high prevalent disease among the population, and if not controlled, it causes complications and irreparable damage to the eye and cause blindness. This study goal is to investigate the predictive power of multiple logistic regression model and the Artificial Neural Network Multi-layer Perceptron (MLP) in determining patients with and without diabetic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016