Password Distribution

Original author: Dr. Colin gillespie
  • Transfer
The other day I came across interesting conclusions from an analysis of recently leaked accounts from Sony servers. I think these conclusions will be interesting and relevant.

As you know, recently, Sony has been a whipping boy among hackers. Thanks to Sony, many accounts and passwords circulate on the Internet. Recently, Troy Hunt conducted a small analysis of these passwords. Here is an excerpt from his post:
  • Of the approximately forty thousand passwords, a third is prone to a simple dictionary attack .
  • Only one percent of passwords contained non-alphanumeric characters.
  • 93 percent of passwords contained between 6 and 10 characters.


In this post, we will explore the remaining 24 thousand passwords that have withstood the dictionary attack.

Character distribution

As Troy notes, the vast majority of passwords contained only one type of character - either all in lower case, or all in upper case. However, everything is even worse if we consider the frequency of characters.

There are 78 unique characters in the password database. If these passwords were truly random, each character should occur with a probability of 1/78 = 0.013. But, when we calculate the real frequency of the characters, we will clearly see that the distribution is not random. The following graph shows the top 20 password characters, and the red line shows the expected 1/78 distribution.

image


It is not surprising that the vowels “e”, “a” and “o” are very popular, as well as the numbers “1”, “2” and “0” (in that order). Capital letters are not in the top twenty. We can also plot the total probability for the symbols. In this graph, red dots show the expected pattern when using real random passwords ( link to a larger graph).
image

It is clear that passwords are not as random as we would like.

Character Order

Let's look at the order of characters in a password. For simplicity, we take only 8-character passwords. The most popular number in the password is “1”. If its location were random, then we would expect an even distribution. But instead, we get: From this it follows that out of 84 percent of passwords that contain the number "1", this figure only happens in the second half of the password. It is clear that people like to put a unit at the end of a password. The same picture with the number "2": And with "!" We observe similar patterns with the rest of the alphanumeric characters.
##Distribution of "1" over eight character passwords
0.06 0.03 0.04 0.04 0.13 0.13 0.22 0.34




0.05 0.05 0.04 0.05 0.13 0.11 0.30 0.27

#Small sample size here
0.00 0.00 0.00 0.00 0.00 0.11 0.16 0.74



The number of characters needed to guess the password

Suppose we collect all possible passwords using the first N most popular characters. How many passwords will we cover in our sample? The following graph shows the proportion of passwords covered in our list using the first N characters:
image
To cover 50% of the passwords in the list, we needed the first 27 characters. Actually, using only 20 characters covers about 25% of passwords, and using 31 characters covers 80% of passwords. Remember that these passwords did not succumb to dictionary attacks.

Total

Usually, when we calculate the probability of guessing the password, we assume that each character is selected with the same probability, that is, the probability of choosing “e” is equal to choosing “Z”. This is clearly wrong. Also, in recent years, many systems have forced users to choose different types of characters in passwords. And it’s so easy to add a number to the end. I don’t want to consider effective password guessing techniques, but it’s clear that bruteforce is not the right method.

Personally, I gave up trying to remember passwords a long time ago and just use the password manager. For example, my Wordpress password is longer than 12 characters and consists of completely random numbers, letters and specials. characters. Of course, you just need to keep your password manager protected ...

From a translator: Yes, I did fall into the category of people ascribing ones and exclamation points to bypass annoying sites. Sad but true .

Also popular now: