High Bit Search Algorithms

    Here I want to talk and discuss several algorithms for finding the highest unit bit of a number.

    Just in case, I’ll explain: the highest bit is the single bit of the number, which is responsible for the largest degree of two. In other words, this is the largest power of two, not exceeding the number. To avoid many cases, we will assume here that we are dealing with a natural number in the range from 1 to 2 ^ 31 - 1 inclusive. In addition, in order not to go too deep into the theory of probability, we assume that the number in which it is necessary to determine the most significant bit is equally likely to be any of the possible numbers.

    For starters, consider the simplest, first-come-to-mind algorithm. Let's go through all the powers of two, and choose from them the maximum that does not exceed the number. Here, obviously, it is possible to take advantage of the monotony of this property, that is, if some degree of two does not exceed a number, then it is less than a degree and moreover does not exceed. Therefore, this method can be written very simply: (here I use java, but I think it will be clear to everyone, due to the native code)

    int bit1(int x) {
       int t = 1 << 30;
       while (x < t) t >>= 1;
       return t;
    }
    




    Let's see how long it can work. If we assume that the number with the same probability turns out to be any of the indicated interval, then, with a probability of 1/2, namely, if x turns out to be no less than 2 ^ 30, the while loop will not work more than once. With a probability of 1/4, namely, if x is in the range from 2 ^ 29 to 2 ^ 30 - 1, the cycle will work once. Etc. It is easy to understand that this means that the cycle fulfills an average of half a time. Which is quite good on average, but bad in the worst case: on the number x = 1, the cycle will work all thirty times.

    The following algorithm is free from this nuisance; or rather, it is devoid of uncertainty in the time of work in general. I will first demonstrate the code, and then explain how it works.

    int bit2(int x) {
       x |= x >> 1;
       x |= x >> 2;
       x |= x >> 4;
       x |= x >> 8;
       x |= x >> 16;
       return x - (x >> 1);
    }
    


    So, let the number x = 000001bbbbbb be given (I did not follow the required number of bits in the number, b means any bit). Then
    x == 000001bbbbbb
    x >> 1 == 0000001bbbb
    x | x >> 1 == 0000011bbbb
    

    Thus, the first action after the eldest unit, wherever it is, puts the next. The second, as you can understand, puts two more behind these two. Third - another 4 (if any, where to put). Etc. Thus, in the number of all bits after the most significant are single. Then it is clear that x - (x >> 1) gives us the correct answer.

    The third algorithm is not entirely without arbitrariness in operating time, but in the worst case, however, it works much faster than the first. It is based on bin search . Let's try to take the middle bit, for example, the 16th, and form a condition on whether there will be the most significant bit of the younger 16th or not. It is clear that this condition will be x <1 << 16. Therefore, it is necessary to save the test result:

    int t = 1;
    if (x >= 1 << 16) t <<= 16;
    


    Well, then, of course, we need to check whether it is possible to move t by another 8 bits, then by 4, by 2, and by 1: So, the second and third algorithms work on average longer than the first (which is confirmed by a direct experiment, and it’s also that the third algorithm works a little faster than the second), but in the worst case they work faster. In addition, one thing to note. The first method uses the magic constant 1 << 30. Its use is based on the fact that we know that a number is equally likely to be any from 1 to 2 ^ 31 - 1. If the numbers are smaller, you can reduce the constant. For example, if the numbers are from 1 to 100000, then you can start with int t = 1 << 17. But what if we don’t know

    int bit3(int x) {
       int t = 1;
       if (x >= t << 16) t <<= 16;
       if (x >= t << 8) t <<= 8;
       if (x >= t << 4) t <<= 4;
       if (x >= t << 2) t <<= 2;
       if (x >= t << 1) t <<= 1;
       return t;
    }
    



    What will be the numbers for which the method is used? If they can theoretically be equal to 2 ^ 31 - 1, but in reality they will be no more than 1000. Then we will have to set int t = 1 << 30, and this method (as, again, experiments confirm) will work significantly longer than the next two. That is, the first method can not only sometimes work for a long time, but it may also turn out that on average it works longer. Its operating time can be unpredictable. While all these troubles do not apply to the other two methods.

    Finally, another nuance. Three of the above algorithms return exactly the power of two - a number of the form 2 ^ k. But what if we need to find exactly k? The first and third methods are easily transformed so as to give an exponent, not the degree itself. The first algorithm starts to work a little faster, the third - a little longer. But the second algorithm is completely incapable of this. Thus, the second algorithm also has its own specific minus.

    PS The first algorithm is too obvious to raise the question of authorship, the second is prompted to me by one relatively well-known olympiad programmer, and the third I came up with myself, although I am convinced that it is far from the only and certainly not the first.

    Also popular now: