MintEye CAPTCHA solution in 31 lines of code without even opening a picture
Inspired by the article “MintEye CAPTCHA Solution in 23 lines of code” , as well as the desire to understand more deeply the methods of extracting edges of the image, such as the Sobel operator and the Canny operator , I decided to try to repeat the algorithm described in the article myself.
Having quickly sketched a script downloading a set of “experimental” images from the MintEye website , I was ready to open my favorite IDE to start experimenting with “high technologies”, but looking at the directory with the downloaded images, I found one very interesting pattern.
All images (in JPEG format, which is very important) related to one captcha had the same size in bytes!
“But this is impossible!” I thought. To explain why it’s not possible, I will briefly remind you of the main ideas underlying the JPEG compression algorithm. For a more detailed description, welcome to Wikipedia.
The JPEG algorithm compresses the original image in the following sequence:
- The color model is being converted: the source image is converted from the RGB color space to the YCbCr color space . After this step, the image is divided into a brightness channel (Y) and two “color difference channels” - Cb and Cr;
- Downsampling of the Cb and Cr color channels is performed. As a rule, a simple reduction in size by half. Due to the fact that the human eye is more sensitive to a change in brightness than color, a significant reduction in image size is achieved on this with very slight loss of quality;
- Each channel is divided into so-called “coding blocks”. In the most common case, a “coding block” is a linear array of 64 bytes, obtained from an 8x8 pixel image block by traversing it along such a cunning trajectory resembling a “snake”;
- A discrete cosine transform is applied to the “coding blocks” . I will not go into details, but after this transformation, the “coding block” turns, roughly speaking, into a certain set of coefficients, closely related to the number of small details and smooth color transitions in the original image.
- Quantization is performed. At this point, the “compression ratio” or “desired image quality” parameter comes into play. Again, very roughly speaking, the idea here is that all the coefficients are less than a certain threshold value (determined by the desired compression ratio), stupidly reset. The remaining coefficients still allow to restore the original image with some accuracy. It is this stage that creates the compression artifacts so well known to all;
- And finally, all that remains of our poor source image is finally “squeezed” by the lossless compression algorithm. For JPEG, this is almost always the Huffman algorithm .
How can knowing the “inner kitchen” of the JPEG algorithm be useful in solving the MintEye CAPTCHA? And the fact that knowing all this, it becomes obvious that two different pictures that have the same size (in pixels) and are compressed with the same quality settings, with a probability of almost 100%, will have a different size in bytes! Moreover, the largest size will be that picture in which there are more small details, and less smooth color transitions.
To prove this, we take an old Lena and conduct the following experiment (all three images are compressed with the standard Photoshop-based “Save for Web”, with quality 40):
Gaussian Noise 5% Size: 10342 bytes | Original Size: 7184 bytes | Gaussian Blur 1,5px Size: 4580 bytes |
Well, that was what was required to be proved: ceteris paribus, the more noise - the larger the file size, the larger the blur - the smaller the size.
Why then are all MintEye CAPTCHA images the same size? The secret of focus turned out to be primitive to the impossibility: files are simply supplemented with zeros to the size of the largest of them!
Having discovered this, almost immediately in my head I was born in a way “impudent”, but extremely simple and effective solution for recognizing this, with the permission to say “captcha”. Take a look at these two pictures: On the left is a slightly distorted image that is one position to the left of the “correct” one. On the right is the undistorted image, which is the correct answer.
At first glance, the left picture is slightly curved. But in fact, such "twisting" at a small angle greatly "blurs" the sharp boundaries and small details. So, based on the known features of JPEG compression, such a slightly distorted picture should differ in size from the correct one, and differ sharply down!
In order to check such a bold assumption, I open the IDE and in just 10 minutes I write the following:
import java.io.IOException;
import java.io.RandomAccessFile;
public class MintEye {
public static void main(String[] args) throws IOException {
int maxDelta = 0;
int correctNum = 0;
int zeroes = 0;
int prevZeroes = 0;
for (int n = 0; n < 30; n++) {
if (n > 0)
prevZeroes = zeroes;
zeroes = 0;
RandomAccessFile raf = new RandomAccessFile(
String.format("images/%1$02d.jpg", n + 1), "r");
long fileLen = raf.length();
for (int i = (int) fileLen - 1; i >= 0; i--) {
raf.seek(i);
if (raf.read() != 0)
break;
zeroes++;
}
int delta = prevZeroes - zeroes;
if (delta > maxDelta) {
maxDelta = delta;
correctNum = n;
}
raf.close();
}
System.out.printf("Correct image: %d%n", correctNum + 1);
}
}
In a nutshell: we go through our pictures (MintEye will have 30 pieces), counting the number of zeros at the end of the current file and comparing it with the number of zeros at the end of the previous one. The file for which this difference will be the maximum, presumably, will be the original, undistorted picture, that is, the correct answer.
The idea turned out to be absolutely true: 100%! 10 out of 10! All downloaded picture sets were recognized unmistakably. At the same time, without using absolutely any image processing libraries, and without even loading pictures into memory!
As a result, I was once again convinced that sooner or later there will be its own “recognizer” for each tricky captcha, and my personal “collection of scalps” was replenished with MintEye (which can now safely fold your startup
Well, Habrahabr replenished with this article.
PS I know that the above code is far from ideal, in particular, it won’t work if the very first picture is correct. But I did not strive for the ideal, but simply wanted to illustrate the idea as briefly as possible.