I’m not a human: Breaking the Google reCAPTCHA
نویسندگان
چکیده
Since their inception, captchas have been widely used for preventing fraudsters from performing illicit actions. Nevertheless, economic incentives have resulted in an arms race, where fraudsters develop automated solvers and, in turn, captcha services tweak their design to break the solvers. Recent work, however, presented a generic attack that can be applied to any text-based captcha scheme. Fittingly, Google recently unveiled the latest version of reCaptcha. The goal of their new system is twofold; to minimize the effort for legitimate users, while requiring tasks that are more challenging to computers than text recognition. ReCaptcha is driven by an “advanced risk analysis system” that evaluates requests and selects the difficulty of the captcha that will be returned. Users may be required to click in a checkbox, or solve a challenge by identifying images with similar content. In this paper, we conduct a comprehensive study of reCaptcha, and explore how the risk analysis process is influenced by each aspect of the request. Through extensive experimentation, we identify flaws that allow adversaries to effortlessly influence the risk analysis, bypass restrictions, and deploy large-scale attacks. Subsequently, we design a novel low-cost attack that leverages deep learning technologies for the semantic annotation of images. Our system is extremely effective, automatically solving 70.78% of the image reCaptcha challenges, while requiring only 19 seconds per challenge. We also apply our attack to the Facebook image captcha and achieve an accuracy of 83.5%. Based on our experimental findings, we propose a series of safeguards and modifications for impacting the scalability and accuracy of our attacks. Overall, while our study focuses on reCaptcha, our findings have wide implications; as the semantic information conveyed via images is increasingly within the realm of automated reasoning, the future of captchas relies on the exploration of novel directions.
منابع مشابه
Breaking reCAPTCHA: A Holistic Approach via Shape Recognition
CAPTCHAs are small puzzles which should be easily solvable by human beings but hard to solve for computers. They build a security cornerstone of the modern Internet service landscape, deployed in essentially any kind of login service, allowing to distinguish authorized human beings from automated attacks. One of the most popular and successful systems today is reCAPTCHA. As many other systems, ...
متن کاملMulti-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks
Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. Traditional approaches to solve this problem typically separate out the localization, segmentation, and recognition steps. In this paper we propose a unifie...
متن کاملGeo-reCAPTCHA: Crowdsourcing large amounts of geographic information from earth observation data
The reCAPTCHA concept provides a large amount of valuable information for various applications. First, it provides security, e.g. for a form on a website, by means of a test that only a human could solve. Second, the effort of the user for this test is used to generate additional information, e.g. digitisation of books or identification of house numbers. In this work, we present a concept for a...
متن کاملThe Abuse Sharing Economy: Understanding the Limits of Threat Exchanges
The underground commoditization of compromised hosts suggests a tacit capability where miscreants leverage the same machine—subscribed by multiple criminal ventures—to simultaneously profit from spam, fake account registration, malicious hosting, and other forms of automated abuse. To expedite the detection of these commonly abusive hosts, there are now multiple industrywide efforts that aggreg...
متن کاملAudio Based Recaptcha
The twenty-first century is filled with many new gadgets and technological innovations. The society is getting digitalized with every passing hour. Various speech to text converters are digitalizing the audio files but the main obstacles is noise which halts the progress of the converters. Another important thing is that they can't recognize accents of all the people. So the efficiency of these...
متن کامل