
memset - side of darkness

After reading the article The most dangerous function in the C / C ++ world, I found it useful to delve into the evil lurking in the dark memset cellar and write an addendum to reveal the essence of the problem more widely.
In the C language, memset () is widely used, with many traps. Excerpt from C ++ Reference:
void * memset (void * ptr, int value, size_t num);
Fill block of memory
Sets the first num bytes of the block of memory pointed by ptr to the specified value (interpreted as an unsigned char).
Parameters
ptr - Pointer to the block of memory to fill.
value - Value to be set. The value is passed as an int, but the function fills the block of memory using the unsigned char conversion of this value.
num - Number of bytes to be set to the value. size_t is an unsigned integral type.
Return Value
ptr is returned.
As has been repeatedly noted, there are many rakes that even experienced developers step on. From the Andrey2008 article described in a brief summary of typical errors :
No. 1. Trying to calculate the size of an array or structure, do not use sizeof () for pointers to an array / structure, it will return you a pointer size of 4 or 8 bytes, instead of the size of the array / structure.
No. 2. The third argument, memset (), takes the number of bytes as input, not the number of elements, regardless of the data type. I’ll also add, for example, the int type can occupy either 4 or 8 bytes, depending on the architecture. In this case, use sizeof (int).
No. 3. Do not confuse the arguments. The correct sequence is a pointer, value, length in bytes.
Number 4. Do not use memset when working with class objects.
But this is just the tip of the iceberg.
Memset alternative
memset is a low-level function that requires the developer to take into account all the features of the computer architecture and its use should be justified. Let's start by considering the alternative = {0} , instead of memset , they say this allows you to initialize an array or string at the compilation stage, which should increase the speed of the program, unlike memset (also ZeroMemory), which initialize data at runtime. I decided to check it out.
void doInitialize()
{
char p0[25] = {0} ; // установит все 25 символов в 0
char p1[25] = "" ; // установит все 25 символов в 0
wchar_t p2[25] = {0} ; // установит 25 символов в 0
wchar_t p3[25] = L"" ; // установит все 25 символов в 0
short p4[62] = {0} // установит 62 значения в 0
int p5[37] = {-1} ; // установит значение первого элемента в -1
unsigned int p6[10] = {89} ; // установит значение первого элемента 89
}
C99 [$ 6.7.8 / 21]
If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.
At the same time, this initialization removes problems No. 1, No. 2, No. 3 with a confusion of parameters and buffer sizes. That is, we will not confuse the second and third arguments in some places, the size does not need to be transferred. Let's see how compilers convert such code. I can’t check all the compilers right away, gcc included in android-ndk-r10c and gcc in ubuntu 04/14 turned out to be on hand.
gcc -v
1) gcc version 4.9 20140827 (prerelease) (GCC)
2) gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
2) gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)
Let's see how the compiler behaves on such a piece of code:
void empty_string(){
int i;
char p1[25] = {0};
printf("\np1: ");
for (i = 0; i < 25; i++)
printf("%x,",p1[i]);
}
So, without optimization (-O0), the initialization of the array is compiled into such assembler code (we look through the binaries using objdump):
gcc -O0, ELF 32-bit, ARM, EABI5
83d8: e3a03000 mov r3, #0
83dc: e50b3024 str r3, [fp, #-36] ; 0x24
83e0: e24b3020 sub r3, fp, #32
83e4: e3a02000 mov r2, #0
83e8: e5832000 str r2, [r3]
83ec: e2833004 add r3, r3, #4
83f0: e3a02000 mov r2, #0
83f4: e5832000 str r2, [r3]
83f8: e2833004 add r3, r3, #4
83fc: e3a02000 mov r2, #0
8400: e5832000 str r2, [r3]
8404: e2833004 add r3, r3, #4
8408: e3a02000 mov r2, #0
840c: e5832000 str r2, [r3]
8410: e2833004 add r3, r3, #4
8414: e3a02000 mov r2, #0
8418: e5832000 str r2, [r3]
841c: e2833004 add r3, r3, #4
8420: e3a02000 mov r2, #0
8424: e5c32000 strb r2, [r3]
8428: e2833001 add r3, r3, #1
gcc -O0, ELF 64-bit, x86-64
400700: 48 c7 45 d0 00 00 00 00 movq $0x0,-0x30(%rbp)
400708: 48 c7 45 d8 00 00 00 00 movq $0x0,-0x28(%rbp)
400710: 48 c7 45 e0 00 00 00 00 movq $0x0,-0x20(%rbp)
400718: c6 45 e8 00 movb $0x0,-0x18(%rbp)
As expected, without optimization, we get a run-time code that will eat O (n) processor time (where n is the buffer length). What can the compiler with optimization (-O3) do, we can see below.
gcc -O3, 32-bit, ARM
000083ac :
83ac: e59f002c ldr r0, [pc, #44] ; 83e0
83b0: e92d4038 push {r3, r4, r5, lr}
83b4: e08f0000 add r0, pc, r0
83b8: ebffffb2 bl 8288
83bc: e59f5020 ldr r5, [pc, #32] ; 83e4
83c0: e3a04019 mov r4, #25
83c4: e08f5005 add r5, pc, r5
83c8: e1a00005 mov r0, r5
83cc: e3a01000 mov r1, #0
83d0: ebffffac bl 8288
83d4: e2544001 subs r4, r4, #1
83d8: 1afffffa bne 83c8
83dc: e8bd8038 pop {r3, r4, r5, pc}
gcc -O3, 64-bit, x86-64
00000000004006d0 :
4006d0: 53 push %rbx
4006d1: be a4 08 40 00 mov $0x4008a4,%esi
4006d6: bf 01 00 00 00 mov $0x1,%edi
4006db: 31 c0 xor %eax,%eax
4006dd: bb 32 00 00 00 mov $0x32,%ebx
4006e2: e8 d9 fd ff ff callq 4004c0 <__printf_chk@plt>
4006e7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
4006ee: 00 00
4006f0: 31 d2 xor %edx,%edx
4006f2: 31 c0 xor %eax,%eax
4006f4: be aa 08 40 00 mov $0x4008aa,%esi
4006f9: bf 01 00 00 00 mov $0x1,%edi
4006fe: e8 bd fd ff ff callq 4004c0 <__printf_chk@plt>
400703: 83 eb 01 sub $0x1,%ebx
400706: 75 e8 jne 4006f0
400708: 5b pop %rbx
400709: c3 retq
We see that a piece of code with zeroing in run-time just disappeared, we got the promised O (1) performance, let's figure out where printf gets its values from? We are interested in this piece:
83bc: ldr r5, [pc, #32]
83c0: mov r4, #25 ;// В r4 записываем количество циклов for, это наш счётчик цикла
83c4: add r5, pc, r5 ;// В r5 записываем текст "%x," как константу, в памяти она хранится как 002c7825
83c8: mov r0, r5 ;// r5 неизменно передаётся в r0 на каждой итерации цикла, это первый параметр printf()
83cc: mov r1, #0 ;// записываем константу 0 (вместо фактического p1[i]) как второй параметр printf()
83d0: bl 8288
83d4: subs r4, r4, #1 ;// Отнимаем единицу в счётчике цикла
83d8: bne 83c8 ;// Если не дошли до 0, то переходим на начало цикла 83c8
That is, the compiler simply threw out the array, and instead of its values uses 0, as a constant laid down at the stage of compilation. OK, but what happens if we use memset ? Let's see a few pieces of objdump, for example, under ARM:
Without optimization, -O0 :
83d8: e24b3024 sub r3, fp, #36 ; 0x24
83dc: e1a00003 mov r0, r3
83e0: e3a01000 mov r1, #0
83e4: e3a02019 mov r2, #25
83e8: ebffffa3 bl 827c
With optimization -O3 :
83c0: e58d3004 str r3, [sp, #4]
83c4: e58d3008 str r3, [sp, #8]
83c8: e58d300c str r3, [sp, #12]
83cc: e58d3010 str r3, [sp, #16]
83d0: e58d3014 str r3, [sp, #20]
83d4: e58d3018 str r3, [sp, #24]
83d8: e5cd301c strb r3, [sp, #28]
x86-64
Without optimization -O0:
With optimization -O3:
400816: ba 19 00 00 00 mov $0x19,%edx
40081b: be 00 00 00 00 mov $0x0,%esi
400820: 48 89 c7 mov %rax,%rdi
400823: e8 a8 fc ff ff callq 4004d0
With optimization -O3:
4007f4: 48 c7 04 24 00 00 00 00 movq $0x0,(%rsp)
4007fc: 48 c7 44 24 08 00 00 00 movq $0x0,0x8(%rsp)
400805: 48 c7 44 24 10 00 00 00 movq $0x0,0x10(%rsp)
40080e: c6 44 24 18 00 movb $0x0,0x18(%rsp)
That is, optimization simply removes the memset call by inserting it inline. In such cases, memset will always work in O (n) time, but initialization with = {0} during optimization works for a constant, in our case it doesn’t completely take away processor clock cycles, brazenly discarding the fact of the array existence and replacing all its elements zeros. But let's see if this is always the case and what happens if we write a non-zero value after initialization? The test function will look like this:
void empty_string(){
int i;
char p1[25] = {0};
p1[0] = 65;
printf("\np1: ");
for (i = 0; i < 25; i++)
printf("%x,",p1[i]);
}
After compilation, we get an already familiar code block:
8404: e3a02041 mov r2, #65 ; 0x41
8408: e08f0000 add r0, pc, r0
840c: e58d3004 str r3, [sp, #4]
8410: e58d3008 str r3, [sp, #8]
8414: e58d300c str r3, [sp, #12]
8418: e58d3010 str r3, [sp, #16]
841c: e58d3014 str r3, [sp, #20]
8420: e58d3018 str r3, [sp, #24]
8424: e5cd301c strb r3, [sp, #28]
8428: e5cd2004 strb r2, [sp, #4]
x86-64
4006f8: 48 c7 04 24 00 00 00 movq $0x0,(%rsp)
4006ff: 00
400700: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
400707: 00 00
400709: 48 c7 44 24 10 00 00 movq $0x0,0x10(%rsp)
400710: 00 00
400712: c6 44 24 18 00 movb $0x0,0x18(%rsp)
400717: c6 04 24 41 movb $0x41,(%rsp)
And it looks as if the compiler inserted an optimized version of memset to us . And let's see what happens if the size of the array grows significantly? Say, not 25 bytes, but 25 kilobytes!
83fc: e24ddc61 sub sp, sp, #24832 ; 0x6100
8400: e24dd0a8 sub sp, sp, #168 ; 0xa8
8404: e3a01000 mov r1, #0
8408: e59f2054 ldr r2, [pc, #84] ; 8464
840c: e1a0000d mov r0, sp
8410: ebffff99 bl 827c
x86-64
400720: 55 push %rbp
400721: ba a8 61 00 00 mov $0x61a8,%edx
400726: 31 f6 xor %esi,%esi
400728: 53 push %rbx
400729: 48 81 ec b8 61 00 00 sub $0x61b8,%rsp
400730: 48 89 e7 mov %rsp,%rdi
400733: 48 8d ac 24 a8 61 00 lea 0x61a8(%rsp),%rbp
40073a: 00
40073b: 48 89 e3 mov %rsp,%rbx
40073e: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
400745: 00 00
400747: 48 89 84 24 a8 61 00 mov %rax,0x61a8(%rsp)
40074e: 00
40074f: 31 c0 xor %eax,%eax
400751: e8 8a fd ff ff callq 4004e0
400756: be 54 09 40 00 mov $0x400954,%esi
40075b: bf 01 00 00 00 mov $0x1,%edi
400760: 31 c0 xor %eax,%eax
400762: c6 04 24 41 movb $0x41,(%rsp)
400766: e8 a5 fd ff ff callq 400510 <__printf_chk@plt>
40076b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400770: 0f be 13 movsbl (%rbx),%edx
400773: 31 c0 xor %eax,%eax
400775: be 5a 09 40 00 mov $0x40095a,%esi
40077a: bf 01 00 00 00 mov $0x1,%edi
40077f: 48 83 c3 01 add $0x1,%rbx
400783: e8 88 fd ff ff callq 400510 <__printf_chk@plt>
400788: 48 39 eb cmp %rbp,%rbx
40078b: 75 e3 jne 400770
40078d: 48 8b 84 24 a8 61 00 mov 0x61a8(%rsp),%rax
400794: 00
400795: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
40079c: 00 00
40079e: 75 0a jne 4007aa
4007a0: 48 81 c4 b8 61 00 00 add $0x61b8,%rsp
4007a7: 5b pop %rbx
4007a8: 5d pop %rbp
4007a9: c3 retq
Wow!
Line = {0} goes to the side of darkness, memset rejoices!
However, we will not be forgotten, nevertheless we decided to solve the problem with the parameters, now we will not succeed in confusing the arguments.
Line initialization
It will also not be superfluous to consider the option of initializing the array = "" . C uses null-terminated strings, that is, the first character with a byte value of 0x00 means the end of the string. Therefore, to initialize a line, it makes no sense to nullify all elements, it is enough just to reset the first one. Here are some ways to initialize an empty string:
void doInitializeCString()
{
char p0[25] = {0} ; // установит все символы в 0
char p1[25] = "" ; // установит все символы в 0
char p2[25] ;
p2[0] = 0 ; // установит первый символ в 0
char p3[25] ;
memset(p3, 0, sizeof(p3)) ; // установит 25 символов в 0
char p4[25] ;
strcpy(p4, "") ; // установит первый символ в 0
char *p5 = (char *) calloc(25, sizeof(char)) ; // установит все символы в 0
}
The most reliable way how initialization will work through = "" is to parse objdump again. Without optimization, we won’t see anything special, everything is similar there = {0} , we will consider right away with the -O3 option. So we compile under ARM:
such a function
void empty_string(){
int i;
char p1[25] = "";
printf("\np1: ");
for (i = 0; i < 25; i++)
printf("%x,",p1[i]);
}
And, suddenly, we get the zeroing of all elements of the array.
83c0: e58d3004 str r3, [sp, #4]
83c4: e58d3008 str r3, [sp, #8]
83c8: e58d300c str r3, [sp, #12]
83cc: e58d3010 str r3, [sp, #16]
83d0: e58d3014 str r3, [sp, #20]
83d4: e58d3018 str r3, [sp, #24]
83d8: e5cd301c strb r3, [sp, #28]
x86-64
400768: 48 c7 04 24 00 00 00 00 movq $0x0,(%rsp)
400770: 48 c7 44 24 08 00 00 00 movq $0x0,0x8(%rsp)
400779: 48 c7 44 24 10 00 00 00 movq $0x0,0x10(%rsp)
400782: c6 44 24 18 00 movb $0x0,0x18(%rsp)
Oh well! Why nullify all unused characters in a null-terminated string ?! It is enough to reset one single byte. Hmm, and if there are 25 thousand bytes, what will it do? And here is what:
8474: e24ddc61 sub sp, sp, #24832 ; 0x6100
8478: e24dd0a8 sub sp, sp, #168 ; 0xa8
847c: e3a0c000 mov ip, #0
8480: e28d3f6a add r3, sp, #424 ; 0x1a8
8484: e1a0100c mov r1, ip
8488: e59f204c ldr r2, [pc, #76] ; 84dc
848c: e28d0004 add r0, sp, #4
8490: e503c1a8 str ip, [r3, #-424] ; 0x1a8
8494: ebffff78 bl 827c
x86-64
00000000004007b0 :
4007b0: 55 push %rbp
4007b1: ba a0 61 00 00 mov $0x61a0,%edx
4007b6: 31 f6 xor %esi,%esi
4007b8: 53 push %rbx
4007b9: 48 81 ec b8 61 00 00 sub $0x61b8,%rsp
4007c0: 48 8d 7c 24 08 lea 0x8(%rsp),%rdi
4007c5: 48 8d ac 24 a8 61 00 lea 0x61a8(%rsp),%rbp
4007cc: 00
4007cd: 48 c7 04 24 00 00 00 movq $0x0,(%rsp)
4007d4: 00
4007d5: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
4007dc: 00 00
4007de: 48 89 84 24 a8 61 00 mov %rax,0x61a8(%rsp)
4007e5: 00
4007e6: 31 c0 xor %eax,%eax
4007e8: 48 89 e3 mov %rsp,%rbx
4007eb: e8 f0 fc ff ff callq 4004e0
4007f0: be 54 09 40 00 mov $0x400954,%esi
4007f5: bf 01 00 00 00 mov $0x1,%edi
4007fa: 31 c0 xor %eax,%eax
4007fc: e8 0f fd ff ff callq 400510 <__printf_chk@plt>
400801: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
400808: 0f be 13 movsbl (%rbx),%edx
40080b: 31 c0 xor %eax,%eax
40080d: be 5a 09 40 00 mov $0x40095a,%esi
400812: bf 01 00 00 00 mov $0x1,%edi
400817: 48 83 c3 01 add $0x1,%rbx
40081b: e8 f0 fc ff ff callq 400510 <__printf_chk@plt>
400820: 48 39 eb cmp %rbp,%rbx
400823: 75 e3 jne 400808
400825: 48 8b 84 24 a8 61 00 mov 0x61a8(%rsp),%rax
40082c: 00
40082d: 64 48 33 04 25 28 00 xor %fs:0x28,%rax
400834: 00 00
400836: 75 0a jne 400842
400838: 48 81 c4 b8 61 00 00 add $0x61b8,%rsp
40083f: 5b pop %rbx
400840: 5d pop %rbp
400841: c3 retq
It looks like a dark memset is chasing us. If you still want to fight against the darkness, then it is worth mentioning what other traps await you.

memset may initialize numbers with incorrect values
If you want to fill an array of integers with non-zero values, check out the byte-filling data.
void doInitializeToMistakenValues()
{
char pChar[25] ;
unsigned char pUChar[25] ;
short pShort[25] ;
unsigned short pUShort[25] ;
int pInt[25] ;
unsigned int pUInt[25] ;
// Значения 2-байтовых и 4-байтовых элементов будут отличны от единицы
memset(pChar, 1, sizeof(pChar)) ; // 1
memset(pUChar, 1, sizeof(pUChar)) ; // 1
memset(pShort, 1, sizeof(pShort)) ; // 257
memset(pUShort, 1, sizeof(pUShort)) ; // 257
memset(pInt, 1, sizeof(pInt)) ; // 16843009
memset(pUInt, 1, sizeof(pUInt)) ; // 16843009
// Значения unsigned массивов заполнится байтами 0xFF
memset(pChar, -1, sizeof(pChar)) ; // -1
memset(pUChar, -1, sizeof(pUChar)) ; // 255
memset(pShort, -1, sizeof(pShort)) ; // -1
memset(pUShort, -1, sizeof(pUShort)) ; // 65535
memset(pInt, -1, sizeof(pInt)) ; // -1
memset(pUInt, -1, sizeof(pUInt)) ; // 4294967295
}
Let's see how it turns out. So, let's say we have an int array, pass the second parameter to the unit, what happens?
And here's what:
0x01010101 - in hexadecimal notation, each byte will be filled with a unit, and the correct value
0x00000001 will not be possible to set with the memset function. But actually this is not a bug, it is a feature.
That's just ignorance of these features leads to unpredictable errors.
memset may set an invalid value
If we set the bytes -1 to double elements, we get the value Not-A-Number (NaN), and after subsequent calculations, each operation with the value NaN will return NaN, thus breaking the entire chain of calculations.
In the same way, setting -1 to the bool type is incorrect and it will not formally be either true or false. Although in most cases it will behave as true. In most cases ...
And lastly, memset is only designed to work with simple data structures . Never use memset with managed data structures, this function is intended only for low-level operations.

The article used materials memset is evil .
Also read about printf function vulnerabilities .