AMDRussia June 9, 2016 at 18:21

Brute force versus passwords

AMD video adapters can be used not only for its intended purpose (games and graphics). Everyone knows about OpenCL's ability to accelerate general GPU computing, and today we’ll talk about security issues associated with impressive computing power. One of the most popular graphics cards for GPGPU is AMD R9 280X. With a modest price of ~ 220-230 dollars, she is ready to share three gigabytes of memory and 2048 stream processors based on the GCN v1.0 architecture, giving a total of about 3.4 TFlops for single precision calculations and about 870 GFlops for double precision calculations. Performance indicators may vary slightly depending on the vendor version and the clock speeds “wired” in the BIOS.

For comparison, one scandalous video card (one with “3.5 GB of 4 declared” memory) has a hundred dollars higher price, while it shows an impressive 3.8-4 TFlops for 32-bit floating-point numbers, but for FP64 - funny ~ 120-130 gflops.

Back to GPGPU. Perhaps for your task you will not have enough features of one video card, and you put two, three, or even four, since motherboards and power supplies can now allow this. What if this is not enough? The killer feature of OpenCL technology, Virtual OpenCL, is entering the scene, allowing you to combine many accelerators installed in several computers into one high-performance cluster.

Virtual opencl

VCL is available for free and works with any hardware that supports the OpenCL 1.0 or 1.1 standard, allows you to combine various devices into one computer network and provide its power for any application that can work with OpenCL.

As an example of the application of such technology, I would like to talk about a monstrous password-guessing farm consisting of 25 AMD GPUs.

Password cracker

Hacking a password with brute force often rests on the computing power of a computer that will bust. Even if we ignore all possible levels of protection against bruteforce attacks (like captcha or delete / encrypt information after the nth attempt to enter with the wrong password), it takes a long time to sort out the standard 8-character password. If you use only lowercase letters of the Latin alphabet, you will have to sort out 26 ⁸ (208 827 064 576) options, and if you use numbers, special characters and a different case, then the number of possible combinations will exceed 72 ⁸ (722 204 136 308 736). It may not be so difficult to generate 720 trillion passwords, but no one stores the passwords themselves in open form, of course, instead of using their hashes.

Calculation of hash collisions (searching for values whose hash matches the one you are looking for) is a much more resource-intensive task than it might seem at first glance - it is its particular case that BitCoin network participants decide. Habra user mark_ablov wrote an amazing article on the topic of bitcoin mining using a pen and paper , in which he examined in detail all the stages of computing and showed how “vulnerable” BitCoin is to the hardware capabilities of productive clusters.

Modern passwords are stored in a form that cannot be easily "solved" by assembling a special chip, so that pieces of iron that can play roughly and efficiently appear on the stage: providing a huge amount of FP32 / FP64 operations per second, and here is AMD technology, OpenCL capabilities and VCL farms come in handy.

When BitCoin was "mined" using video cards, fans assembled special farms from a large number of accelerators:

A couple of years ago, about the same piece of iron, spaced into several server enclosures, was shown at a computer security conference in Oslo. You can find quite a few options for using this chest in experienced hands:

The GPU cluster runs on Linux, the video cards are integrated by the VCL system, which provides the host system with all the video cards as one large system for executing OpenCL orders.

A farm can make up to 350 billion estimated password hashes per second using the NTLM algorithm. It has been used in Microsoft Windows since Windows Server 2003. To search for an eight-character password (the most popular in length both among ordinary users and in the corporate segment), which contains all the Latin characters in different registers, numbers and special characters, five seconds are enough for this monster half an hour.

As you may have guessed, the main difficulty for hacking is provided by the exponent, that is, the length of the password. Increasing the password by one character leads to an increase in complexity by 2 orders of magnitude:

72 ⁹ = 51 998 697 814 228 992 against 72 ⁸ = 722 204 136 308 736. In this case, one character increases the number of options ~~80 thousand times~~ 72 times. Simply put, the longer and harder your password is, the more difficult it is to find it by brute force.

Performance

The capabilities of such a farm from the GPU are really impressive, and they show good results even on “heavy” hashing algorithms: MD5 (180 billion assumptions per second), SHA1 (63 billion assumptions per second) and LM (20 billion assumptions per second) . For the so-called The “slow” hash algorithms also show good results: bcrypt (05) and sha512crypt received 71,000 and 364,000 assumptions per second, respectively.

Optimization and scalability

The experiments with Password Cracker were carried out for a long time, when VCL was a fairly "raw" product. The collaboration of the author of this mega-farm with the creators of VCL led to an improvement in the load balancer. A special script has improved the performance of Hashcat on VCL, so today you can run the code not at 25, but at least 128 GPUs while maintaining a linear increase in performance.

In June 2012, Pole-Henning Kamp, the author of the md5crypt () function, which is widely used on FreeBSD and Linux, asked the community to stop using its function. This is even materialI went out to Habré. The author was afraid of a situation where an attacker could get more than 1 million checks per second on computer hardware available in ordinary stores. Password Cracker by 25 GPU surpassed the fears of Pole-Hennign Kamp by 77 times, and the ability to scale it 5 or more times makes hash search even more vulnerable to collisions: if now the "standard" eight-character password is resolved in 6-8 hours, then by 128 GPU such bust can be reduced to an hour.

It will not concern me

Perhaps no one will ever crack your company, and at home you do not store anything valuable / discrediting / important. No one is safe from leaks at major companies: Relatively recently, LinkedIn has lost six and a half million password hashes. If the farm “guesses” passwords on AMD video accelerators (and not on professional hardware), then about 90% of passwords could be processed in a reasonable time.

A long and complex password (provided that it is used properly, is not stored in an open form and all that) is half the protection against such powerful computing systems. Of course, there are other approaches (like “salty” hashes), but it’s not always possible to make changes to the existing algorithm or work product, and to extend the minimum password to 13 or 20 characters is as easy as shelling pears.

Tags: