Captcha. Hacking and protection methods
Cross-post from our computer security blog
Earlier, we wrote about captcha vulnerabilities on the Kyivstar and Beeline websites . Now we want to bring to your attention our research in the field of hacking and captcha protection.
# Captcha theory
In 1950, Alan Turing wrote an article on Computing Machines and Mind, which was published in the philosophical journal Mind. The article was about a certain test that can distinguish a person from a computer. We will not describe the details of the test, you yourself can read it on Wikipedia.
So, captcha is one of the implementations of the Turing test, which is used to determine the system user: he is a person, or a computer. In other words, captcha protects the site from spam bots that leave messages in guest books, in comments, etc.
There are thousands of captcha implementations. Basically, captchas are a kind of image with text that a person needs to define. Here are some examples of modern captchas:
# Vulnerabilities captcha
We’ll immediately answer the question: Why are they trying to find vulnerabilities in captcha? The most basic goal is spam. And where there is spam - advertising. And advertising is money. Spammers write software, for example, which massively advertises their clients in forums (leaving posts and private messages), in guest books, mail gates and other services. They write software bots that mimic human actions. In order to prevent bulk messages, software developers use captcha. It can be an open-source solution, or “your own” or an individual solution. Using any of these solutions, programmers can make mistakes. It depends on the experience of the programmer, but not always. Sometimes a captcha is correctly done, but there are errors in the configuration of server software or other factors.
Now let's look at a few methods that can be used to circumvent captcha:
1. Programmer errors
The most common mistake in implementing a captcha is its technical implementation. The programmer may miss out on subtle points in processing the result. For example, captcha text can be transmitted in the session, or the name of the image file is that text. For example, http://markitup.com/Captcha.ashx?txt=G-SG (PageRank of this site is 5)
2. Optical character recognition
Optical character recognition ( OCR ) - electronic conversion of images of characters and letters into text editable on a computer.
The simplest OCR is the reference. It consists in the most common reference comparison of numbers (letters) in the image.
Vulnerability of this kind is very rare. But it also comes across large sites. An example of this is the vulnerability of captcha on the Beeline website.
3. Bad idea
The essence of the method: a very good implementation of captcha (using noise, etc.) is used, but for example, it is a digital or letter captcha of three characters. In this case, you can pick up the answer with real brute force. This method is quite effective when a neural network is used - it’s also a bot network, dumb zombie machines that try to guess the image.
4. The human mind
This method consists in the fact that people are paid for the definition of captcha. For example, there are a lot of domestic services that pay up to $ 0.1 for determining one captcha. This is a real threat because it’s hard to resist. An entire industry is built on this method.
There are so many different options for implementing captcha. We will not describe these methods, but we focus on ready-made solutions and a few tips.
When developing captchas, the first thing to focus on is not beauty, but complexity for automated bots. An example of this is the captcha used by Google and Bigmir-Internet in their projects.
Safe captchas need to be carefully thought out. It is imperative to use various kinds of text distortions (for pictures with numbers and text), high-quality processing on the server side.
We also want to offer you to use ready-made secure solutions for organizing captcha on your website, blog and service. One such is reCaptcha. This is not quite a new captcha technology based on client-server technologies.
A captcha that is quite difficult to bypass, and easy for the user. The user is prompted to enter two words (or a set of numbers). The values entered by a special API are sent to the reCaptcha server, where they are checked and the response is sent to your processed script: the correct input, or the wrong one.
ReCaptcha is very easy to integrate into popular engines of blogs, sites. And also there are libraries for different programming languages.
Today, there are a large number of types of captchas. Implementing the Turing test in any form, we cannot be sure of its reliability. Therefore, you need to use either ready-made solutions or use the services of security experts who can analyze your implementation and give a security assessment.
Authors: Chernysh Vadim and Rybalko Dmitry , Glaive Security Group