
Karatsuba algorithm for multiplying two numbers
Somehow I had to implement the multiplication of long numbers, through the halves of their representation in C ++. 128-bit numbers were presented as a pair of 64-bit numbers. It turned out that to multiply two 128-bit numbers and get all 256 bits of the result in complexity is comparable to 4 products of 64-bit halves. How does it work ...

For multiplication, the method of the domestic mathematician Anatoly Alekseevich Karatsuba was used . First, I’ll give you my comment on the function. He clarifies a lot about the algorithm.
In principle, the commentary makes it clear how the result is calculated.
The essence of the method can be understood if the following formula is derived:

T symbolizes the indentation, respectively T ^ 2 - double indentation.
Here is the code that performs these operations:
Type T is a type of half, type x0, x1, y0, y1.
Type N2 is the type of half the result, 2 times that of type T.
Example of using the function:
The code with the result of execution can be found here:codepad.org/1OTeGqhA
Code with an algorithm for different orders of bytes (LE + BE) here:codepad.org/f5Pxtiq1
UPDATE1:
User mark_ablov noticed that the code lacks #undef.
Corrected Code:codepad.org/13U4fuTp
Corrected full code (LE + BE):codepad.org/kBazqo8f
UPDATE2: The
user ttim noticed that the above algorithm is not exactly the Karatsuba method and pointed to the possibility of using 3 multiplications instead of four.
Clarity formula:

Thus, only 3 products need to be calculated:
1. a * x
2. b * y
3. (Ta + b) * (Tx + y)
Unfortunately, I will not be able to use this formula, because . I do not have the ability to multiply 128-bit numbers. Actually, my task was to implement the multiplication of 128-bit numbers. The fact is that the numbers of (Ta + b) and (Tx + y) are 128 bits ...
UPDATE3: The
user susl continued the discussion of the algorithm and showedthat you don’t need to multiply 128-bit numbers at all.
Another formula:

The function can be rewritten as follows:
Corrected example: http://codepad.org/w0INBD77
Corrected example for LE and BE: http://codepad.org/nB9HqWt1

For multiplication, the method of the domestic mathematician Anatoly Alekseevich Karatsuba was used . First, I’ll give you my comment on the function. He clarifies a lot about the algorithm.
//
// Karatsuba multiplication algorithm
//
// +------+------+
// | x1 | x0 |
// +------+------+
// *
// +------+------+
// | y1 | y0 |
// +------+------+
// =
// +-------------+-------------+
// + | x1*y1 | x0*y0 |
// +----+-+------+------+------+
// . . .
// . . .
// +-+------+------+
// + | x0*y1 + x1*y0 |
// +-+------+------+
//
In principle, the commentary makes it clear how the result is calculated.
The essence of the method can be understood if the following formula is derived:

T symbolizes the indentation, respectively T ^ 2 - double indentation.
Here is the code that performs these operations:
//
// Params:
// T - is type of x0, x1, y0 and y1 halves
// T2 - is type of x, y and half of res
//
template
inline void Karatsuba_multiply(T * const x, T * const y, T2 * res)
{
// Define vars (depending from byte order)
#define ptrRes ((T*)res)
T2 & lowWord = (T2&)(ptrRes[0]);
T2 & midWord = (T2&)(ptrRes[1]);
T2 & highWord = (T2&)(ptrRes[2]);
T & highByte = (T &)(ptrRes[3]);
#undef ptrRes
const T & x0 = x[0];
const T & x1 = x[1];
const T & y0 = y[0];
const T & y1 = y[1];
// Multiply action
lowWord = x0 * y0;
highWord = x1 * y1;
T2 m1 = x0 * y1;
T2 m2 = x1 * y0;
midWord += m1;
if (midWord < m1) highByte++;
midWord += m2;
if (midWord < m2) highByte++;
}
Type T is a type of half, type x0, x1, y0, y1.
Type N2 is the type of half the result, 2 times that of type T.
Example of using the function:
int main()
{
typedef unsigned char u8;
typedef unsigned short u16;
typedef unsigned int u32;
u16 a = 1000;
u16 b = 2000;
u32 r = 0;
u8 * a_ptr = (u8*)&a;
u8 * b_ptr = (u8*)&b;
u16 * r_ptr = (u16*)(void*)&r;
Karatsuba_multiply(a_ptr, b_ptr, r_ptr);
cout << r;
}
The code with the result of execution can be found here:
Code with an algorithm for different orders of bytes (LE + BE) here:
UPDATE1:
User mark_ablov noticed that the code lacks #undef.
Corrected Code:
Corrected full code (LE + BE):
UPDATE2: The
user ttim noticed that the above algorithm is not exactly the Karatsuba method and pointed to the possibility of using 3 multiplications instead of four.
Clarity formula:

Thus, only 3 products need to be calculated:
1. a * x
2. b * y
3. (Ta + b) * (Tx + y)
Unfortunately, I will not be able to use this formula, because . I do not have the ability to multiply 128-bit numbers. Actually, my task was to implement the multiplication of 128-bit numbers. The fact is that the numbers of (Ta + b) and (Tx + y) are 128 bits ...
UPDATE3: The
user susl continued the discussion of the algorithm and showedthat you don’t need to multiply 128-bit numbers at all.
Another formula:

The function can be rewritten as follows:
//
// Params:
// T - is type of x0, x1, y0 and y1 halves
// T2 - is type of x, y and half of res
//
template
inline void Karatsuba_multiply(T * const x, T * const y, T2 * res)
{
// Define vars (depending from byte order)
#define ptrRes ((T*)res)
T2 & lowWord = (T2&)(ptrRes[0]);
T2 & midWord = (T2&)(ptrRes[1]);
T2 & highWord = (T2&)(ptrRes[2]);
T & highByte = (T &)(ptrRes[3]);
#undef ptrRes
const T & x0 = x[0];
const T & x1 = x[1];
const T & y0 = y[0];
const T & y1 = y[1];
// Multiply action
lowWord = x0 * y0;
highWord = x1 * y1;
//T2 m1 = x0 * y1;
//T2 m2 = x1 * y0;
T2 m = (x0+x1)*(y0+y1) - (lowWord + highWord);
//midWord += m1;
//if (midWord < m1) highByte++;
//midWord += m2;
//if (midWord < m2) highByte++;
midWord += m;
if (midWord < m) highByte++;
}
Corrected example: http://codepad.org/w0INBD77
Corrected example for LE and BE: http://codepad.org/nB9HqWt1