Why Blur Does Not Hide Confidential Information Badly
- Transfer
Surely everyone saw on TV and on the Internet photographs of people, especially blurry to hide their faces. For example, Bill Gates:
For the most part, this works because there is no convenient way to reverse the blur into a sufficiently detailed photo to recognize a face. So with faces everything is fine. However, many resort to blurring confidential numbers and text . I will show why this is a bad idea.
Suppose someone posted a photo of his check or credit card on the Internet for some terrible reason (proving on the forum that he made a million dollars or showing something funny, or comparing the size of something with a credit card, etc. .) blurred the image using a too common mosaic effect to hide the numbers:
Seems safe because no one will read the numbers? INCORRECT ANSWER. There is an attack on this scheme:
There are two ways to do this. You can either remove the numbers in a graphical editor, or open an account in the same bank and photograph your own card from the same angle, combine white balance and contrast. Then remove the numbers from it in a graphical editor (in a high-resolution photo, this is easier to do).
In our examples, of course, this is easily done:
Use a script to iterate over all possible account numbers and create a check for each, separating groups of numbers. For example, on VISA cards, numbers are grouped by 4, so you can individually process each section. This requires only 4 × 10,000 = 40,000 images, which is easily generated by the script.
Determine the exact size and pixel offset of the mosaic tiles used to blur the original image (easy), and then do the same with each of your blurred images. In this case, we see that the blurry image consists of a 8x8 pixel mosaic, and the offset is determined by counting from the upper border of the image (not shown):
Now we sort through all the images, blurring them the same way as the original one, and get something like this:
What does it mean? Well, let's take the mosaic version 0000001 (increased):
... and determine the brightness level (0-255) of each area of the mosaic, naming them in some consistent way as :
In this case, the account number 0000001 creates a mosaic brightness vector . We find the mosaic brightness vector for each account number in the same way, using a script to blur each image and read the brightness. Let be - account number function . Then denotes the i-th vector value of the mosaic brightness vector obtained from account number . Above,.
Now we do the same for the original control image that we found on the Internet or anywhere, getting a vector that we will call here:
Define the brightness vector of the mosaic of the original image, let's call it and then just calculate the distance from each account number (indicated by ) to the brightness vector of the mosaic (after normalization):
where
Now just find the smallest . For credit cards, only a small part of the possible numbers confirms the hypothetically possible numbers of credit cards, so there is nothing complicated here either.
For example, in our case, we calculate
and then proceed to calculate the distances:
Maybe the account number corresponds to the mosaic 1124588?
In the real world, real photos, not fictitious examples taken in Photoshop. We have text distortion due to camera angle, imperfect alignment and so on. But this does not prevent a person from accurately determining the type of distortion and creating an appropriate script! In any case, a few minimum defined distances can be considered as candidates, and especially in the world of credit cards, where numbers are beautifully divided into groups of 4, and only 1 out of 10 numbers is actually a valid number, which makes it easy to choose from several most likely candidates.
To realize this in real photographs, the distance algorithm should be improved. For example, you can rewrite the above distance formula to normalize standard deviations in addition to the average. You can also independently process the RGB or HSV values for each area of the mosaic, as well as use scripts to distort the text by several pixels in each direction and compare (which still leaves you with a completely limited number of comparisons on a fast PC). You can use algorithms similar to existing nearest neighbor algorithms to increase the reliability of work in real photographs.
So yes, I used my image and adapted it for this case. But the algorithm can certainly be improved for real-world use. But I have neither the time nor the desire to improve anything, because I do not hunt for your information. But one thing is certain: this is a very simple situation. Do not use simple mosaics to blur the image. All you do is reduce the amount of information in the image that contains everythingeffective bits of account data. When you distribute such images, you want to eliminate personal information rather than obstruct access to it by reducing the amount of visual information.
Imagine a graphic image of 100 × 100. Suppose I just averaged the pixels and replaced each of them with an average value (that is, I turned the picture into a single-pixel “mosaic”). You have just created a function that, from 256 ^ (10000) variants, hashed up to 256 variants. Obviously, with the received 8 bits, you will not be able to restore the original image. But if you know that in total there are 10 options for the original image, then by these 8 bits you can easily determine which one was used.
Most UNIX / Linux system administrators know that passwords in / etc / passwd or / etc / shadow are encrypted with a one-way function, such as Salt or MD5. This is quite safe, since no one can decrypt the password by looking at its encrypted text. Authentication occurs by performing the same one-way encryption of the password entered by the user at the login, and comparing this result with the saved hash. If they match, the user has successfully passed the test.
It is well known that a one-way encryption scheme easily breaks when a user selects a dictionary word as a password. All the attacker needs to do is encrypt the entire English dictionary, compare the encrypted text of each word with the encrypted text stored in / etc / passwd, and select the correct word as the password. Therefore, users are generally advised to choose more complex passwords that are not words. A dictionary attack can be illustrated as follows:
Similarly, image blur is a one-way encryption scheme. You will convert the image that you have into another image intended for publication. But since account numbers usually do not exceed millions, we can compile a “dictionary” of possible numbers. For example, all numbers are from 0000001 to 9999999. Then start automatic processing, which puts each of these images on a photo of an empty background - and blur each image. Then it remains just to compare the blurry pixels and see which options most closely match the original.
The solution is simple: do not blur images! Instead, simply paint over them:
Remember that you want to completely remove information, and not reduce its amount, as in a blurry photo.
For the most part, this works because there is no convenient way to reverse the blur into a sufficiently detailed photo to recognize a face. So with faces everything is fine. However, many resort to blurring confidential numbers and text . I will show why this is a bad idea.
Suppose someone posted a photo of his check or credit card on the Internet for some terrible reason (proving on the forum that he made a million dollars or showing something funny, or comparing the size of something with a credit card, etc. .) blurred the image using a too common mosaic effect to hide the numbers:
Seems safe because no one will read the numbers? INCORRECT ANSWER. There is an attack on this scheme:
Step 1. Get a clean check image
There are two ways to do this. You can either remove the numbers in a graphical editor, or open an account in the same bank and photograph your own card from the same angle, combine white balance and contrast. Then remove the numbers from it in a graphical editor (in a high-resolution photo, this is easier to do).
In our examples, of course, this is easily done:
Step 2. Iteration
Use a script to iterate over all possible account numbers and create a check for each, separating groups of numbers. For example, on VISA cards, numbers are grouped by 4, so you can individually process each section. This requires only 4 × 10,000 = 40,000 images, which is easily generated by the script.
Step 3. Blur each image identically to the original
Determine the exact size and pixel offset of the mosaic tiles used to blur the original image (easy), and then do the same with each of your blurred images. In this case, we see that the blurry image consists of a 8x8 pixel mosaic, and the offset is determined by counting from the upper border of the image (not shown):
Now we sort through all the images, blurring them the same way as the original one, and get something like this:
Step 4. Determine the mosaic brightness vector of each blurred image.
What does it mean? Well, let's take the mosaic version 0000001 (increased):
... and determine the brightness level (0-255) of each area of the mosaic, naming them in some consistent way as :
In this case, the account number 0000001 creates a mosaic brightness vector . We find the mosaic brightness vector for each account number in the same way, using a script to blur each image and read the brightness. Let be - account number function . Then denotes the i-th vector value of the mosaic brightness vector obtained from account number . Above,.
Now we do the same for the original control image that we found on the Internet or anywhere, getting a vector that we will call here:
Step 5. Find the one closest to the original image
Define the brightness vector of the mosaic of the original image, let's call it and then just calculate the distance from each account number (indicated by ) to the brightness vector of the mosaic (after normalization):
where
N(a(x))
and N(z)
are the normalization constants givenNow just find the smallest . For credit cards, only a small part of the possible numbers confirms the hypothetically possible numbers of credit cards, so there is nothing complicated here either.
For example, in our case, we calculate
and then proceed to calculate the distances:
Maybe the account number corresponds to the mosaic 1124588?
“But you used your own image, which is easy to decipher!”
In the real world, real photos, not fictitious examples taken in Photoshop. We have text distortion due to camera angle, imperfect alignment and so on. But this does not prevent a person from accurately determining the type of distortion and creating an appropriate script! In any case, a few minimum defined distances can be considered as candidates, and especially in the world of credit cards, where numbers are beautifully divided into groups of 4, and only 1 out of 10 numbers is actually a valid number, which makes it easy to choose from several most likely candidates.
To realize this in real photographs, the distance algorithm should be improved. For example, you can rewrite the above distance formula to normalize standard deviations in addition to the average. You can also independently process the RGB or HSV values for each area of the mosaic, as well as use scripts to distort the text by several pixels in each direction and compare (which still leaves you with a completely limited number of comparisons on a fast PC). You can use algorithms similar to existing nearest neighbor algorithms to increase the reliability of work in real photographs.
So yes, I used my image and adapted it for this case. But the algorithm can certainly be improved for real-world use. But I have neither the time nor the desire to improve anything, because I do not hunt for your information. But one thing is certain: this is a very simple situation. Do not use simple mosaics to blur the image. All you do is reduce the amount of information in the image that contains everythingeffective bits of account data. When you distribute such images, you want to eliminate personal information rather than obstruct access to it by reducing the amount of visual information.
Imagine a graphic image of 100 × 100. Suppose I just averaged the pixels and replaced each of them with an average value (that is, I turned the picture into a single-pixel “mosaic”). You have just created a function that, from 256 ^ (10000) variants, hashed up to 256 variants. Obviously, with the received 8 bits, you will not be able to restore the original image. But if you know that in total there are 10 options for the original image, then by these 8 bits you can easily determine which one was used.
Dictionary Attack Analogy
Most UNIX / Linux system administrators know that passwords in / etc / passwd or / etc / shadow are encrypted with a one-way function, such as Salt or MD5. This is quite safe, since no one can decrypt the password by looking at its encrypted text. Authentication occurs by performing the same one-way encryption of the password entered by the user at the login, and comparing this result with the saved hash. If they match, the user has successfully passed the test.
It is well known that a one-way encryption scheme easily breaks when a user selects a dictionary word as a password. All the attacker needs to do is encrypt the entire English dictionary, compare the encrypted text of each word with the encrypted text stored in / etc / passwd, and select the correct word as the password. Therefore, users are generally advised to choose more complex passwords that are not words. A dictionary attack can be illustrated as follows:
Similarly, image blur is a one-way encryption scheme. You will convert the image that you have into another image intended for publication. But since account numbers usually do not exceed millions, we can compile a “dictionary” of possible numbers. For example, all numbers are from 0000001 to 9999999. Then start automatic processing, which puts each of these images on a photo of an empty background - and blur each image. Then it remains just to compare the blurry pixels and see which options most closely match the original.
Decision
The solution is simple: do not blur images! Instead, simply paint over them:
Remember that you want to completely remove information, and not reduce its amount, as in a blurry photo.