# Introduction

Suppose we have some set that consists of N elements. We assume that the elements are numbered from zero to N-1 . A set of k -element subsets of a given set (combinations) can be represented either as an array of indices of length k . Or in the form of a sequence of N bits in which exactly k of them are set. Donald Knuth in his TAoCP provides an algorithm for generating combinations in lexicographical order, when combinations are given as an array of indices. We will try to transfer this algorithm to the case of bit masks.

# Algorithm

The algorithm for generating the next lexicographic combination is quite simple and consists of two steps. At the first step, we need to find the smallest index m of the element, the next after which the element is not included in the combination. Or, which is the same thing that we can increase by one. We replace this element with the following. In the second step, replace all elements that are smaller than our selected mth element with the smallest possible. For example, we have a combination of {8, 5, 4, 3, 2} . The smallest element that can be increased by one is 5 . Replace it with the six: {8, 6, 4, 3, 2} . Now delete three elements that are less than six: {8, 6}. And add the least three elements. Received {8, 6, 2, 1, 0} - the following combination in lexicographical order.

Now we translate this algorithm into the language of bits. The first step is to search for such the least significant bit, immediately before which zero is located. The second step is to exchange the received unit and zero places. Third step: all bits that are younger than the found one are shifted to the zero position. Consider our example? 100111100 → 10 0 1 11100 → 10 1 0 11100 → 1010 111 00 → 1010 00 111 → 101000111.

# X & -x bit trick

I love different bit tricks. But many programmers are not very familiar with them, and they drive them into a stupor. For example, not everyone knows that the expression x & -x will set all bits of a number to zero, with the exception of the smallest unit set. How does he work?

By definition, -x = ~ (x-1) . The simplest illustration: -1 = ~ (1-1) = ~ 0 . Suppose now that the number x has the form nnnn1000 , where n are the bits that can be set to zero or to one. Our goal is to show that (nnnn1000 & -nnnn1000) = 00001000 . We get the following chain: nnnn1000 & -nnnn1000 = nnnn1000 & ~ (nnnn1000 - 1) = nnnn1000 & ~ (nnnn0111) = nnnn1000 & ñññññ1000 = 00001000where ñ is the corresponding inverted bit n .

# Getting the bit mask of the next lexicographic combination

Now we illustrate how thought should work in order to obtain a bit expression for the next lexicographic combination. If we add to our number a number in which only one least significant bit is left, then as a result of the transfer, the least significant bit before which is zero will move to the place of this zero. All other low bits will be reset to zero:

``````    int a = x & -x;
int b = x + a;
``````

As a result, if x = 100111100 , then a = 000000100 , and b = 101000000 . Half the job done. It remains only to select the least significant bits and move them to the right. In order to set unused bits to zero, the AND operation is most often used. Given the trick x & -x, the option immediately begs:

``````    int c = b & -b; // 001000000
int c = c - 1;  // 000111111
int d = c & x;  // 000111100
``````

As a result of which we get a sequence of bits that can be shifted to the right. True, the number of bits will be one more than the required number, which can be easily fixed by shifting one more bit to the right.

However, to reset the matching bits, you can also use the XOR operation. Let's try it:

``````    int c = x ^ b;
``````

In general, we have x can be represented as x = nn ... n011 ... 100 ... , with b = nn..n100 ... 000 ... . Then the operation x ^ b kill coinciding beginning nnn , leaving only 00 ... 100 ... 0111 ... . For our example, c = 001111100 . Unlike the previous case, this sequence of bits is two longer than required. Variant c XOR requires fewer operations. Let's leave it:

``````    int a = x & -x;
int b = x + a;
int c = x ^ b;
``````

Now the sequence of bits, which is stored in s , must be shifted to the right to the least significant bit, and another two "extra" bits. You can do this “forehead”:

``````    c /= a;
c <<= 2;
``````

The division operation is quite expensive, and if the processor supports obtaining the low bit index, then it will probably be faster to use it. For example, for the case of GCC, the corresponding built-in function is called __builtin_ctz , as a result we get:

``````    int d = __builtin_ctz(x) + 2;
c <<= d;
``````

If there is no such instruction, then you can consider the option of obtaining the index through the de Bruyne sequence , then our code will look something like this:

``````    int d = magic_table[(magic_factor * a) >> magic_shift];
c <<= d;
``````

As a result, division and shift were replaced by multiplication, two shifts, and taking values ​​from the table.

# Summarizing

So, as a result, we got the following code:

``````static inline uint64_t next_combination_mask(uint64_t x)
{
uint64_t a = x & -x;
uint64_t b = x + a;
uint64_t c = b ^ x;
a <<= 2;
c /= a;
return b | c;
}
``````

consisting of six elementary operations with integers, and one division. No cycles and conditions. Which should run fast enough. How can it be used in a finished project? For example, we want to list all the possible hand combinations that can be obtained in Omaha. They will be C (52,4) = 270725. You can do this with the following cycle:

``````    uint64_t current = 0x0F; // первая битовая маска;
uint64_t last = 1 << 52; // 52-й бит появится только когда мы рассмотрим все комбинации из 52 карт, и перейдем к 53-й
do {