“Hello World!” In C array int main []

I would like to talk about how I wrote the implementation of “Hello, World!” In C. For heating, I will immediately show the code. Who cares how I got to this, welcome to cat.

#include<stdio.h>constvoid *ptrprintf = printf;
#pragma section(".exre", execute, read)
__declspec(allocate(".exre")) int main[] =
{
    0x646C6890, 0x20680021, 0x68726F57,
    0x2C6F6C6C, 0x48000068, 0x24448D65,
    0x15FF5002, &ptrprintf, 0xC314C483
};


Foreword


So, I started by finding this article . Inspired by it, I began to think how to do it on windows.

In that article, screen output was implemented using syscall, but in windows we can only use the printf function. Maybe I'm wrong, but I haven’t found anything else.

Picking up courage and picking up visual studio, I began to try. I don’t know why I took so long to substitute the entry point in the compilation settings, but as it turned out later, the visual studio compiler does not even throw a warning if main is an array and not a function.

The main list of problems that I had to face:

1) The array is in the data section and cannot be executed
2) Windows does not have syscall and the output must be implemented using printf

Let me explain why the function call is bad here. Usually, the call address is substituted by the compiler from the symbol table, if I'm not mistaken. But we have an ordinary array, where we ourselves must write the address.

The solution to the problem of "executable data"


The first problem that I encountered, as expected, turned out to be that a simple array is stored in the data section and cannot be executed as code. But after a bit of digging stackoverflow and msdn, I still found a way out. The visual studio compiler supports the section preprocessor directive and you can declare a variable so that it appears in a section with permission to execute.

After checking if this was the case, I was convinced that this works and the main array function quietly executes opcode ret and does not cause an “Access violation” error.

#pragma section(".exre", execute, read)
__declspec(allocate(".exre")) char main[] = { 0xC3 };

A bit of assembler


Now that I could execute the array, I needed to compose the code that would be executed.

I decided that the message "Hello, World" I will store in assembler code. I must say right away that I understand assembler quite poorly, so I ask you not to rush strongly with slippers, but criticism is welcome. This answer to stackoverfow helped me in understanding which assembler code to insert and not call unnecessary functions.
I took notepad ++ and using the plugins-> converter -> "ASCII -> HEX" function got the character code.

Hello World!

48656C6C6F2C20576F726C6421

Next, we need to split 4 bytes and put it on the stack in the reverse order, not forgetting to turn it over into little-endian.

Divide, flip.
Add terminal zero to the end.

48656C6C6F2C20576F726C642100

Divide from the end by 4 byte hex numbers.

00004865 6C6C6F2C 20576F72 6C642100

Turn in little-endian and reverse the order

0x0021646C 0x726F5720 0x2C6F6C6C 0x65480000


I slightly missed the point with the way I tried to directly call printf and to save this address later in the array. It turned out for me only having saved the pointer to printf. Later it will be seen why.

#include<stdio.h>constvoid *ptrprintf = printf;
voidmain(){
    __asm {
        push 0x0021646C ; "ld!\0"
        push 0x726F5720 ; " Wor"
        push 0x2C6F6C6C ; "llo," 
        push 0x65480000 ; "\0\0He"
        lea  eax, [esp+2] ; eax -> "Hello, World!"
        push eax ; указатель на начало строки пушим на стек
        call ptrprintf ; вызываем printf
        add  esp, 20 ; чистим стек
    }
}

We compile and watch the disassembler.

00A8B001 686C 642100push21646Ch  
00A8B006 6820576F 72push726F5720h  
00A8B00B 686C 6C 6F 2C       push2C6F6C6Ch  
00A8B0106800004865push65480000h  
00A8B015 8D 442402          lea         eax,[esp+2]  
00A8B019 50push        eax  
00A8B01A FF 150090 A8 00    call        dword ptr [ptrprintf (0A89000h)]  
00A8B02083 C4 14             add         esp,14h  
00A8B023 C3                   ret  

From here we need to take bytes of code.

In order not to manually remove the assembler code, you can use regular expressions in notepad ++.
Regular expression for sequence after code bytes:

 {2} *. *

The beginning of lines can be removed using the plugin for notepad ++ TextFx:

TextFX -> "TextFx Tools" -> "Delete Line Numbers or First Word", selecting all the lines.

After which we will have an almost ready-made code sequence for the array.

68 6C 64 21 00
68 20 57 6F 72
68 6C 6C 6F 2C
68 00 00 48 65
8D 44 24 02
50
FF 15 00 90 A8 00 ; После FF 15 следующие 4 байта должны быть адресом вызываемой фунцкии
83 C4 14
C3


Calling a function with a “pre-known” address


I thought for a long time how to leave the address from the function table in the finished sequence if only the compiler knows this. And asking some familiar programmers and experimenting, I realized that the address of the called function can be obtained using the operation of taking the address from the variable pointer to the function. Which I did.

#include<stdio.h>constvoid *ptrprintf = printf;
voidmain(){
    void *funccall = &ptrprintf;
    __asm {
        call ptrprintf
    }
}



As you can see, the pointer contains exactly the same called address. Exactly what is needed.

Putting it all together


So, we have a sequence of bytes of assembler code, among which we need to leave an expression that the compiler translates to the address we need to call printf. We have a 4-byte address (because we are writing code for a 32-bit platform), which means that the array must contain 4 byte values, so that after byte FF 15 we have the next element, where we will put our address.

Using simple substitutions, we obtain the desired sequence.
Берем полученную ранее последовательность байт нашего ассемблерного кода. Отталкиваясь от того, что 4 байта после FF 15 у нас должны составлять одно значение форматируем под них. А недостающие байты заменим на операцию nop с кодом 0x90.

90 68 6C 64
21 00 68 20
57 6F 72 68
6C 6C 6F 2C
68 00 00 48
65 8D 44 24 
02 50 FF 15
00 90 A8 00 ; адрес для вызова printf
83 C4 14 C3

И опять составим 4 байтные значения в little-endian. Для переноса столбцов очень полезно использовать многострочное выделение в notepad++ с комбинацией alt+shift:

646C6890
20680021
68726F57
2C6F6C6C
48000068
24448D65
15FF5002
00000000 ; адрес для вызова printf, далее будет заменен на выражение
C314C483


Now we have a sequence of 4 byte numbers and an address to call the printf function, and we can finally populate our main array.

#include<stdio.h>constvoid *ptrprintf = printf;
#pragma section(".exre", execute, read)
__declspec(allocate(".exre")) int main[] =
{
    0x646C6890, 0x20680021, 0x68726F57,
    0x2C6F6C6C, 0x48000068, 0x24448D65,
    0x15FF5002, &ptrprintf, 0xC314C483
};

In order to call a break point in the visual studio debugger, you need to replace the first element of the array with 0x646C68 CC.
We start, look.



Done!

Conclusion


I apologize if someone thought the article was “for the smallest”. I tried to describe the process in as much detail as possible and omit the obvious things. I wanted to share my own experience of such a small study. I would be glad if the article would be interesting to someone, and possibly useful.

I’ll leave all the links here:

Article “main usually a function”
Description section on msdn
Some explanation of the assembler code on stackoverflow

And just in case, I’ll leave a link to the 7z archive with the project under visual studio 2013 I

also do not exclude the possibility that the printf call could be further reduced use a different function call code, but I did not manage to investigate this question.

I would be happy for your feedback and comments.

Also popular now: