AlexKonshin April 4, 2011 at 23:55

Simple data scrambling algorithms

From the sandbox

Sometimes you need to encrypt something, but to attract serious encryption algorithms seems to be out of place - it will be like a gun on sparrows. For example, you need simple protection of traffic from users / trojans with sniffers , but the data itself is not worth it so that they spend a lot of time on encryption-decryption, well, on the implementation itself, too. Or you need to somehow ensure that some stored data is kept private from ordinary users. It is clear that such algorithms will not stand up against targeted hacking attempts by professionals, but we will try to complicate the work for them, although such a task is usually not set. This is usually called scrambling .

Under the cut, I will present ideas for such algorithms and promise that they will be more complicated than ordinary XOR with a fixed key. Just in case, I draw attention to the fact that these algorithms do not pretend to be cryptographic, but I’m sure that you can find application for them.

Background

It is assumed that the potential cracker either does not have access to the code that performs the encryption, or does not have sufficient qualifications for reverse engineering, or the data is not so valuable as to spend time on more difficult cracking methods.

Probably everyone knows that in such cases, most often they use simple cyclic overlay of a fixed-length key using the XOR operation. And everyone is also well aware that such protection does not withstand even a novice "hacker" or an advanced user. I would like something more complicated, but simple to implement.

And if you generate a key?

The first thing that comes to mind is to generate a key long enough to at least complicate finding the key length. For example, use a certain pseudo-random number generator with input data known to both the sender and the recipient. One of these commonly used generators is a linear congruent PRNG (PRNG is a pseudo random number generator) Of course, we guess that this is bad, but what exactly is bad in this approach? The problem is that it’s rather difficult to generate parameters for the generator itself. It is rather difficult to select good parameters programmatically for a linear congruent PRNG so that the sequence is long and non-degenerate. On this occasion, you can read in 3.2.1 in the book of D. Knut "The Art of Programming." Therefore, these parameters are often injected into the code as constants and, as a result, a potential cracker receives many messages encrypted with one key, which greatly simplifies its work.

But what if you use the data itself to generate this pseudo-random sequence?

This idea dawned on me about 20 years ago when I “helped” a student of my acquaintance write a diploma. At first glance, it seems that this is impossible, because we need a generator that would give the same sequence for both encryption and decryption. Oddly enough, it is this “killer" thesis that gives us the path to creating such a generator. Yes, we need an algorithm that changes the values of its internal variables in the same way, if we give it the original byte (or whatever we have as the atomic unit of encoding) and the encrypted byte. How to achieve this? Everything ingenious is simple - you can use commutative operations to calculate the next key valuefor pairs of source-encrypted bytes. Since the result of the operation does not depend on the order of the operands in a pair, it is obvious that such an algorithm will change its variables during decryption in the same way as during encryption, but the key sequence for other input data will be different.

An example of an input-dependent key generator

To make it clearer, consider a simple example of such an algorithm.
Let x _n be the next code in the source data, k _n be the current key, k _{n + 1} be the next key value, y _n be the encrypted code x _n .
Q (a, b) is a certain commutative function, i.e. such that q (a, b) == q (b, a).
F (a, b, c) is a certain integer function.
Then iteration by (de) coding can be described as follows:
y _n : = x _nxor k _n ;
k _{n + 1} : = F (k _n , Q (x _n , y _n ), n);
If it is clear for the function F () that its implementation is generally limited only by our imagination and common sense, then about Q (), you probably want to see the details, namely, what conditions it must meet in order to be commutative. The easiest way to achieve this is to use arguments only in pairs in commutative operations, for example xor, addition, multiplication. Examples:
~~Q (a, b) = a xor b~~ . ( Fixed: perhaps I got excited here, because when you overlay the source and encrypted code, a key is obtained, which is undesirable. I personally use slightly more complex functions).
Q (a, b) = ((a xor b) or 1) * ((a + b) xor 1).
As you can see, coming up with your super-duper function Q () is not at all difficult. Another thing, is it necessary to make it complicated? I think that there is no special meaning in its re-complication.

Well, now what's bad?

Yes, without knowing the code of the encoding function, reading something will be very difficult. But what clues still remain? If the input parameters for the scrambler are the same, then

If two messages start with the same data, then the beginning of the encrypted data will be exactly the same;
The key for the first code is the same;
The length of the encrypted message exactly matches the length of the original message.

How to deal with this, but not put your young life? Of course, there can be many ways to fight, but I want cheap and cheerful, are there any? I have a few ideas on this.
To overcome the first two points, there is a very simple, but effective technique. When encrypting, insert random data before each message. Just one byte (code) is enough. Since the next key depends on the data, even for identical messages we get different key sequences. The recipient just needs to discard this prefix after decrypting the message.
To combat the third paragraph, you can add random data before and / or after the message.

Do you have any other ideas?

And then! I always have ideas!
Suppose you are transmitting data in a compressed form. Or the data is already partially encrypted. Or each message / block is quite long and consists of binary data with an approximately uniform distribution of codes. In this case, any interference with the order of codes in the message can significantly complicate the life of a potential cracker. I am sure that you yourself can come up with some primitive byte mixer in the data block, but I promised interesting and beautiful ideas.

Data shuffler

Typically, to solve this problem, some GSPCh is used to obtain pairs of code indices in a data block that are swapped. The unpleasantness of this method is that it is difficult to guarantee that some part of the data will not remain in the same place. It is also not entirely clear how many permutations need to be done in order to achieve an acceptable result, although for reliability you can simply go through all the message codes and exchange each with a random one. But there is one more nuisance - if the generator has a bad square distribution (and a linear congruent one is sick of this disease and is hopeless), then at certain block sizes, you can run into a loop of the output values. I went for a long time to the idea of a fast pseudo-random data shuffler and consider

A bit of theory. In paragraph 3.2.1.2 of D. Knut's book “The Art of Programming”, one can find criteria for choosing a factor for a linear congruent generator, so that the length of the generated sequence is equal to the module. What does this mean? This means that for a generator with module m, each value from 0 to m -1 will be issued exactly once. Why do we need this? Recall that it would be desirable for our shuffler to ensure that all bytes (codes) of the message have changed their place. That is, if the length of a given data block is equal to this m, then it will be enough for us to simply write the next input byte (code) of the message into the output buffer at the index equal to the next value from the generator. The simplicity of this algorithm is so seductive that I could not pass by.

But, as always happens with something seductive, there were some problems. Firstly, not all m are equally ~~useful~~ good. From the same chapter of the same book we see that if mis a product of prime numbers in the first degree, then we cannot achieve a complete enumeration of elements in principle (apart from enumerating in a row, which, of course, is not interesting to us). It turns out that it is impossible to get the generator we need with a given sequence length, and therefore, if we have messages of arbitrary length, then we can not always find such a generator. Dead end? Do we really need generators of arbitrary length? Recall that for a potential cracker, knowing the length of a message is very useful. Then we already know the method of struggle, which we have successfully applied to the generator, depending on the input data. That's right, you need to throw random garbage, and best of all in front of useful data. True, there is a problem in the fact that in each block you need to somehow indicate the amount of useful information in it.m - choose the first value that is convenient for you, which is greater than the length of the input block and satisfies the criterion from Theorem A of 3.2.1.3 from the book.

Now about the criteria for the generator parameters x _{n + 1} = (a * x _n + c) mod m :

c and m are coprime;
a - 1 is a multiple of p for all prime divisors p of m;
a - 1 must be a multiple of 4 if m is a multiple of 4.

How to achieve this with little blood? I suggest this option:
m = 2 ⁿ , where n> 3;
c = p, p is a prime number & p> 2;
a = 4 * k + 1.
It is easy to notice that all three conditions are fulfilled and such values are quite easy to select.

Any more ideas?

The idea of combining a shuffler and a data-dependent generator is pretty obvious. To do this, we first feed the generator the right amount of garbage to fit the message length to the shuffler block size, and then we run the data of the message itself. We write all output codes according to the indices that we get from shuffler.

I think that's enough for today.

Corrections: Corrected a spotted bug and crossed out an unsuccessful example.

Tags: