My first captcha capturing script

Foreword:

I have one friend who holds an online store. Sometimes it pampers me with pretty unusual tasks in programming. It all started with the fact that, having thought, he decided that it would be very convenient for his users to receive information about where his package is now (he sends goods from the store by Russian post). Fortunately, the mail site has a tracking function.

image

On the page, just enter the tracker number and the information about the package appears on the screen in the form of a neat tablet. Without hesitation, I armed myself with the curl function and in a couple of minutes I prepared an easy script - which parsed this information and displayed the last location of the package (status: “Arrived at the place of delivery” or “Handing over to the addressee” - gave the script a command to send an SMS to the buyer that he could pick up his package).


I did not have time to drink money for the script I wrote - as strange things began to happen in the mail. My script stopped working and the reason for this was that the Russian Post website put in a tricky block - which, when the session was empty, redirected the page so that my script would loop. By the way - even an ordinary visitor to their site the first time can not go to their site.

The solution to the problem was that the script when accessing the mail site should follow its direction ( CURLOPT_FOLLOWLOCATION ), for the sake of convincing, I also filled in CURLOPT_REFERER and CURLOPT_USERAGENT. After the first connection, it was possible to re-send the request and the script work to obtain information on the tracker - continued in normal mode. For my tricky manipulations I was awarded a prize, and I calmly took up other projects.

Chapter 1 - Kick from the back


A month after the delivery of the script, the postal servicemen struck back - installing a simple captcha on that very form. They turned to me again for help.
At that time, I knew that there was an opportunity on php to parse the image pixel by pixel - thereby it was possible to teach the script to see and most importantly understand what is shown in the picture. To my regret, I have never done this, but the task was set clearly, and they were already used to the script. By the way - this script reduced failures by 60% - it is very good money and to refuse such a function would be at least stupid.

Chapter 2 - Preparing for the battle


First of all, I started looking at the script itself that displays this captcha.
image
I saw that $ _GET ['Id'] had strange numbers, but unfortunately I didn’t find any connections, but I found out that the same picture at the same address remains available only for 2 minutes.
Well, it doesn’t matter, captcha is pretty simple without noise and in one color.

For starters, I saved myself about 20 different versions of captcha (with different numbers) - it turned out that the script that displays these numbers changes not only their x and y coordinates, but also the size (from 1 to 4 pixels) - So I had to teach the script to solve ~ 40 different numbers.

Now, having felt the amount of upcoming work, we begin to code.
image
Our captcha size width: 70px; height: 23px;. We cycle through the whole picture and find out the color of the pixel (white = 0, not white = 1). We hammer the information into an array. Now, in order to check whether I am doing everything correctly, for clarity, I draw a function in the next loop - which draws a table and sets the cell to the appropriate color. We
image
save and check.
image
Well, as you can see, everything seems to work out. Now you need to figure out how the script can recognize the numbers in the array and interpret them as a picture. Maybe I started inventing a bicycle, but it was more interesting for me to come up with logic myself - without scooping up information - which can only confuse me - from other sources.
After a few mugs of coffee, it was decided to show the anchor point to the script and depending on which pixels are painted next to calculate what kind of figure is drawn in the picture.
image
And so - taking one anchor point (in this case the crown of the digit 1 ) I counted a few pixels along the X and Y axis and if they were black then the script said that it was one. Running the test, I saw what the script calls the unit of number 3,4,7 and 9 - it is logical that there should be more verification conditions. For each digit, I added 9 verification points and after 3 hours I ran a script that was supposed to solve the captcha with numbers 70039
image
And here it is a miracle! The script surely solved the first captcha (knowing only the numbers 7,0,3,9) - for the sake of persuasion, I downloaded the captcha in which there were the same numbers, but unfortunately the script did not work because the numbers differed in height. Looking at the clock, I decided that I needed more reference points and something to automate the training.
Knowing not only PHP , but also JavaScript made a function that, by clicking on a cell, entered it into an array of coordinates - allowing me to set for verification - the maximum number of control points.
image
The process went faster. It took less than a minute to learn a script with one digit, and after an hour the script knew all the numbers that Russian Post had used to generate captcha.
image
image
The information about the connected points for each digit was neatly folded into a separate file - which, in which case, could be supplemented.

Chapter 3 - Retaliation



Having visited the mail site again and downloaded a few more captcha options for verification, I was convinced that the script absolutely correctly guessed the captcha with an accuracy of 100% - not bad for the first time!
image
Even I was less sharp-sighted than my script.
image
image
The output was a php script - size 45 KB. Which accepted the captcha id on the Russian Post website
image
and sent back a code - which is shown in the picture. With ease, I connected my anti-captcha to the previous script (parser) and it worked again!
It took about 8 hours and 10 cups of coffee. A friend was incredibly happy, for which he again wrote me a prize.

Epilogue



I am sure that soon the Russian Post will again answer me with a new challenge, which I will accept with pleasure.
Screenshots of the work process were made by a special program - which selectively inserts your logo on screenshots, you should not pay attention to it.
Let me remind you once again that I did not use other people's developments for my script - I was more interested in the task of writing this script completely from scratch and in a way that I would choose myself, so the comments would be like: “There is a bunch of software ..” or “Why reinvent the wheel” - will be regarded as an inattentive reading of the post.

Also popular now: