alexbagirov November 10, 2014 at 21:28

Break the captcha SilkRoad 2.0

Transfer

This article is a continuation of my previous topic. You asked, and I publish.

To begin with: I was extremely surprised that the code from the first article really defeated the SilkRoad captcha. People really became interested in the dark Internet, and, as you know, SillRoad 2.0 appeared after the closure of his first friend (although the second one was also closed recently). We’ll talk about hacking a captcha under a cat.

The Silk Road vs. SilkRoad 2.0

The Silk Road 2 does not require captcha to enter. It is needed only for registration. Perhaps it is required somewhere else, but I looked at it on only one page.

I found out that the captcha we cracked in a previous article was created in PHP CMS called ExpressionEngine.

SilkRoad 2.0 uses a plugin for Rails called simple-captcha. Its original (?) Branch has not been supported since 2008, but some forks have made serious progress since then. I'm not sure which one is used on the site we are interested in, but this option was chosen for our tests .

Let's just say: the captcha SR and SR2 are not alike, but the version from SR2 is also trivial. SR2 is also likely to be solved with a high probability (99% +) without machine learning, since all operations to obtain a solution are reversible.

First look

CAPTCHA looks pretty good.

Some facts:

No background;
5 characters /\A[A-Z]{5}\z/;
Not a "word", no tricks with dictionaries;
One line of information
Unlike SR: characters are not just rotated or flipped, but also skewed.

All skews look the same, so let's take a better look.

Warp, right?

Judging by the names of the images, the script was called something like "simple_captcha". It got to get its source code, but the solution to the problem was only a couple of hours, not a week. Since 90% of the transformations are only ImageMagick curvatures, it will be irrational to search for the captcha algorithm. Nevertheless, having many examples, but not knowing the principle of operation, the task is complicated.

Therefore, let's take a quick look here and immediately see ImageMagick operations:

params = ImageHelpers.image_params(SimpleCaptcha.image_style).dup
params << "-size #{SimpleCaptcha.image_size}"
params << "-wave #{amplitude}x#{frequency}"
params << "-gravity \"Center\""
params << "-pointsize #{SimpleCaptcha.point_size}"
params << "-implode 0.2"

As you can see, this operation is quite rollback. Do they apply surgery -implode 0.2? Let's do it -implode -0.2!

for i in * ; do
  convert "$i" -implode -0.2 "$i-exploded.png";
done

And take a look at the results of the work done:

Original	Rollback

Even if it were implodeexecuted with a random parameter, we could try several options and, using binary search, determine which one suits us best.

Riding the wave

Now we have text along the y axis. Yes, there is distortion only on it.

I could stop right now and say that the code of my first article would easily solve heaps of these problems, but let's try to change something to achieve 100% success.

Take a look here :

def distortion(key='low')
  key =
    key == 'random' ?
    DISTORTIONS[rand(DISTORTIONS.length)] :
    DISTORTIONS.include?(key) ? key : 'low'
  case key.to_s
    when 'low' then return [0 + rand(2), 80 + rand(20)]
    when 'medium' then return [2 + rand(2), 50 + rand(20)]
    when 'high' then return [4 + rand(2), 30 + rand(20)]
  end
end

Two randomly generated parameters are used in the operator -waveas amplitude and frequency. Judging by ImageMagick's instructions , the start of the wave (along the x axis) is always zero.

Based on these two parameters and binary search, we can build these letters in the same way as soldiers are in the ranks.

Since finding two numbers is pretty straightforward, I will miss this snippet and go straight ahead.

Improved segmentation (feature extraction)

Note that this captcha differs from SR1 captcha in that the spaces between its characters are not the same. Sensation as if kerning was involved . Take a look at the gaps between T, and Jin this example XCUTJ:

The method that we used in the first article failed to successfully now as he looks for a vertical blank. We would get the wrong solution in about 50% of all cases. A clearer algorithm is needed.

Beyond Everything: Moving Squares

This algorithm can divide our objects. (This shows examples for Ruby and C ++ that I wrote a very long time ago.)

Philippe Spies created the best and most suitable example for us that I have ever seen. I took his animation:

The bottom line is that the square will go along the first object it comes across and returns an array with the points found. If you combine this with something like the Douglas Pooker algorithm , you get a polygon. ( And here is the application of this algorithm in another project .)

The problem is that you need to immediately remove the characters you just found without using any other methods.

So, we want to remove the symbol, so that when the moving squares algorithm is restarted, another symbol is found that follows the one just removed. Or we can record the coordinates of the found object. next time to start searching “behind him”, which is more difficult to implement.

This is quite difficult without any library. By the way, pixel-by-pixel operations in Ruby are very (very, very) slow. Let's look for an easier way.

Fill method

It will be smarter, faster and easier.

Duplicate the image;
Find the first black pixel to be "inside" the character;
We fill it with white to make it invisible in the working area;
Find the differences between the original and the image with the removed character;
Repeat until all characters are detected.

It looks something like this:

def each_extracted_object(im)
  return enum_for(__method__, im) unless block_given?
  loop do
    xy = first_black_pixel(im)
    break if xy.nil?
    # Save the original
    copy = im.clone
    # Erase it from our working image
    im = im.color_floodfill(xy[0], xy[1], 'white')
    # Exclusion to get the difference, trim and yield
    copy.composite!(im, 0, 0, Magick::ExclusionCompositeOp)
    copy = copy.negate.trim('white')
    # This stuff creates a bit of garbage
    GC.start
    yield copy
  end
end

Consider step-by-step visual transformations:

Act	Example
Original image (monochrome for simplicity)
Fill with white color for the first pixel found on the left
Find the differences between the original and the image with the erased character
Roll back

The result of the second step is an image with which we will continue to work, and the result of the fourth is an isolated symbol.

All this is done at a relatively good speed.

Note that this method will not work if there are two objects for filling on one straight line along the y axis. For an example see the image from Tand Jabove.

Match Patterns

On the SR1 captcha, in order to separate the characters from the background, we should use filters. It turned

into

With such a captcha we got a set of beautiful letters. Having collected information from 40 captcha, we got this set:

It was obtained, say, by taking the letter M and comparing its transparency with all the rest M : 1 / number_of_m_examples.

Instead of using a neural network, we simply find the symbol with the greatest number of matches (as well as taking into account the wave) with the set we obtained earlier.

def font_match(im, candidate)
  score = 0
  (0...FONT_HEIGHT).each do |y|
    (0...FONT_WIDTH).each do |x|
      if black?(im.pixel_color(x, y)) == black?(candidate.pixel_color(x, y))
        score += 1
      end
    end
  end
  return score.to_f / (FONT_WIDTH * FONT_HEIGHT)
end

0.96 ** 5- About 81% of captcha passed.

For comparison, with 40 examples and 3 hours of training, the neural network defeated only 45%.

To summarize

Solving bad and low-quality captchas is easy. Using some snippets from an article on SR1, this unrelated captcha was defeated in 3 hours. I am sure that with a smarter job, the percentage of solubility will be more than 95.

I still do not want to publish the full script code, since it can be used to hack other applications, namely those that work on simple_captcha gem in Ruby.

I am also curious about the usefulness of captcha in 2014. I heard that up to a thousand captcha for $ 1 can solve you on the Tor network.

Despite this, I learned so much about CAPTCHAs that I have enough of this knowledge for life :)

Thanks for the idea of a series of articles, ilusha_sergeevich

Tags: