Solving the "false positives" problem in fraud prediction
نویسندگان
چکیده
In this paper, we present an automated feature engineering based approach to dramatically reduce false positives in fraud prediction. False positives plague the fraud prediction industry. It is estimated that only 1 in 5 declared as fraud are actually fraud and roughly 1 in every 6 customers have had a valid transaction declined in the past year. To address this problem, we use the Deep Feature Synthesis algorithm to automatically derive behavioral features based on the historical data of the card associated with a transaction. We generate 237 features (>100 behavioral patterns) for each transaction, and use a random forest to learn a classifier. We tested our machine learning model on data from a large multinational bank and compared it to their existing solution. On an unseen data of 1.852 million transactions, we were able to reduce the false positives by 54% and provide a savings of 190K euros. We also assess how to deploy this solution, and whether it necessitates streaming computation for real time scoring. We found that our solution can maintain similar benefits even when historical features are computed once every 7 days.
منابع مشابه
Finding the needle: A risk-based ranking of product listings at online auction sites for non-delivery fraud prediction
Non-delivery fraud is a recurring problem at online auction sites: false sellers that list nonexistent products just to receive payments and afterwards disappear, possibly repeating the swindle with another identity. In our work we identified a set of publicly available features related to listings, sellers and product categories, and built a machine learning system for fraud prediction taking ...
متن کاملImproved Procedure for Screening Expression Libraries for Novel Autoantigens
The standard method for immunoscreening of a cDNA expression library is time-consuming becauseof the production of a large proportion of false positives during the first and second round of screening.This problem is more important when a sensitive chemiluminescence detection system is used. Due tothe high sensitivity of the detection system, there is a need to avoid false posi...
متن کاملEarly Warnings of Plan Failure, Falso Positives and Envelopes: Experiments and a Model
We analyze a tradeoff between early warnings of plan failures and false positives. In general, a decision rule that provides earlier warnings will also produce more false positives. Slack time envelopes are decision rules that warn of plan failures in our Phoenix system. Until now, they have been constructed according to ad hoc criteria. In this paper we show that good performance under differe...
متن کاملTowards accurate transcription start site prediction: a modelling approach
Promoter prediction in bacteria is a classical bioinformatics problem, where available methods for regulatory element detection exhibit a very high number of false positives. We here argue that accurate transcription start site (TSS) prediction is a complex problem, where available methods for sequence motif discovery are not in itself well adopted for solving the problem. We here instead propo...
متن کاملUsing Self-organizing Maps for Binary Classification with Highly Imbalanced Datasets
Highly imbalanced datasets occur in domains like fraud detection, fraud prediction, and clinical diagnosis of rare diseases, among others. These datasets are characterized by the existence of a prevalent class (e.g. legitimate sellers) while the other is relatively rare (e.g. fraudsters). Although small in proportion, the observations belonging to the minority class can be of a crucial importan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.07709 شماره
صفحات -
تاریخ انتشار 2017