Alpha-blending in one multiplication per pixel on Windows Mobile

Those involved in graphics on Windows Mobile have probably heard of the GapiDraw graphics library . If you look at their Feature List , then in the High Performance section you can find the following words: "drawing surfaces with opacity will only require one multiplication per pixel". That is, they argue that drawing a translucent image requires only one multiplication per pixel.

In this article I will try to explain how this is possible.

Firstly, most Windows Mobile devices have a screen with 65 thousand colors. That is, for each pixel on the screen, 2 bytes are allocated in memory. Secondly, such devices have a 32-bit processor. On such a processor, the speed of performing mathematical operations with 2-byte numbers is slightly different from the speed of working with 4-byte numbers. This is where one of the optimization calculations lies. The graphics library processes 2 pixels at a time.

Let's see how the 16-bit pixel works. The upper 5 bits are reserved for red, then 6 bits for green and the remaining 5 bits for blue.
P = [RRRR RGGG GGGB BBBB]
     15      8 7       0

At the same time, the colors themselves can be obtained using shifts and masks according to the formulas:
int R = (P >> 11) & 0x1f;
int G = (P >> 5) & 0x3f;
int B = P & 0x1f;

A pixel is obtained by the inverse transformation:
unsigned short P = (R << 11) + (G << 5) + B;

When we try to draw one pixel on top of another with transparency, color mixing occurs. Both pixels are divided into 3 components: red, blue, green. Then the components mix, red mixes with red, blue mixes with blue, green mixes with green. And from the obtained values ​​a new pixel is compiled.

In general, color mixing looks like this:
C = C1*α + C2*(1 - α)

C1 is the value of red, green or blue from the first pixel.
C2 - value of the same color in the second pixel.
C is the resulting color.
α is the transparency coefficient, takes values ​​from 0 to 1.

It is quite expensive to store the real value of α and operate with it, especially on handheld computers, therefore it is limited to a certain integer range. Typically, values ​​0-255 are used, but 5-bit α is sufficient for 5-bit color.

First, pay attention to how color mixing is done with 50% transparency:
C = C1/2 + C2/2

Division by 2 can be replaced by a shift:
C = (C1 >> 1) + (C2 >> 1)

Now let's see what happens when you mix two pixels one to one:
P = (R1 >> 1 + R2 >> 1) << 11 + (G1 >> 1 + G2 >> 1) << 5 + (B1 >> 1 + B2 >> 1)

Expanding the brackets we get:
P = (P1 & 0xf7de) >> 1 + (P2 & 0xf7de) >> 1

A mask of 0xf7de appears to ensure a 1-shift in each color component is correct. Compare:
[RRRR RGGG GGGB BBBB] = P
[0RRR RRGG GGGG BBBB] = P >> 1 // Младший бит одних компонент становится старшим битом других
[RRRR RGGG GGGB BBBB] = P
[1111 0111 1101 1110] = 0xf7de
[RRRR 0GGG GG0B BBB0] = P & 0xf7de // Исправляем этот недостаток перед сдвигом
[0RRR R0GG GGG0 BBBB] = P & 0xf7de >> 1

Similarly to this method, you can do the same operations for mixing pixels 1 to 3, 1 to 7 ... But in these respects this method is completely ineffective, mainly due to the fact that division occurs before multiplication. Therefore, you should first make room for multiplication. Note that multiplying the m-bit unsigned number by n-bit, we get a number occupying no more than m + n bits. That is, you need to free 5 bits in front of each color, and for this you can divide the pixel colors into even and odd ones and operate on them already. So we come to the following blending procedure using 2 multiplications per pixel:
#define Shift(p)   ((p)>>5)
#define OddCol(p)  ((p) & 0x07E0F81F)
#define EvenCol(p) ((p) & 0xF81F07E0)
// Забираем из каждого буффера по 2 пикселя
register unsigned int p1 = *dst;
register unsigned int p2 = *(src++);
// Здесь 4 умножения, но мы обрабатываем 2 пикселя одновременно
*(dst++) = OddCol( Shift(OddCol(p1)*a + OddCol(p2)*(32 - a)) ) |
           EvenCol( Shift(EvenCol(p1))*a + Shift(EvenCol(p2))*(32 - a) );

How to get rid of one more multiplication? Let's try to open the brackets and rearrange the terms:
C = (C1*α + C2*(32 - α)) >> 5 = (C1 - C2)*α >> 5 + C2

But a negative number may appear in parentheses, and arithmetic operations with negative numbers are based on overflow. Thus, it is impossible to apply the formula in this form for parallelizing computations. Therefore, get rid of negative values ​​by adding the maximum possible:
C = (C1 - C2 + 32 - 32)*α >> 5 + C2 = (C1 - C2  + 32)*α >> 5 + C2 - α

Using this formula and the trick with even and odd colors, you can get a pixel mixing procedure that uses only one multiplication per pixel:
#define OddCol(p)  ((p) & 0x07E0F81F)
#define EvenCol(p) ((p) & 0xF81F07E0)
register unsigned int p1 = *dst;
register unsigned int p2 = *(src++);
register unsigned int oddA = (a<<22) | (a<<11) | (a);
register unsigned int evenA = (a<<27) | (a<<16) | (a<<6);
register unsigned int oddP2 = OddCol(p2);
register unsigned int evenP2 = EvenCol(p2);
// 2 умножения при обработке 2 пикселей
oddCol = (((OddCol(p1) - oddP2 + 0x08010020) * a) >> 5) + oddP2 - oddA;
evenCol = ((( (EvenCol(p1)>>5) - (evenP2>>5) + 0x08010040) * a) + evenP2 - evenA );
*(dst++) = OddCol(oddCol) | EvenCol(evenCol);

Also popular now: