MS-DOS virus world
- Transfer

This post is a text version of a speech that I gave at the 35th Chaos Computer Congress in late 2018.
And so I have to admit that MS-DOS outraged me a little, despite the fact that MS-DOS malware always fascinated me to some extent, but first we must ask: “What is DOS?”
- DOS is one of the versions of CP / M, another very old operating system.
- The DOS family covers a wide range of vendors, simply because it is DOS, does not mean that it will run on an 8086 CPU or better.
- Some of these DOS providers have an API compatibility, which means that some of them use malware!

Video of performance:
Пост написан при поддержке компании EDISON Software, которая разрабатывает приложение для виртуального мобильного оператора и занимается разработкой и сопровождением сайтов на Python.

But in fact, most of our memories of the DOS era are the aesthetics of what computers of that time looked like:

This is the era of “beige color calculations” and the Model M keyboard, which may be known or notorious depending on whether you like noisy keyboard or not.

Some of us may have memories of using DOS, and some may still use DOS!

For example, George R. Martin, who wrote Game of Thrones, is rumored to have used Wordstar at DOS to write a book!

We also can not miss QBASIC, for many it would be their first acquaintance with programming!

But sometimes life using DOS was not so good, sometimes you used DOS, and suddenly such things happened. In this example, a small tune is played during printing, so this can be a very awkward situation in an office environment.

Some of them are more “cute”, in this case, for example, an ambulance car, drawn by the ascii characters, passes, and then the program that you wanted to open is launched, in the worst case with light inconveniences.
Thanks to a bunch of archivists for malware running under the name VX Heavens, we have a good historical archive of DOS malware, or at least until the Ukrainian police raid the site:
On Friday, March 23, the server was seized by the police in connection with a criminal investigation (Article 361-1 of the Criminal Code of Ukraine - the creation of malicious programs for the purpose of their sale or distribution) based on someone's hints. about “free access of malicious software designed for unauthorized hacking of computers, automated systems, computer networks”.
Fortunately, popular torrent sites still have copies of the site database, which can provide us with a wonderful set of data:
$ tar -tvf viruses-20070914.tar | wc -l
66714
$ ls -alh viruses-20070914.tar
6.6G viruses-20070914.tar
However, to start exploring these samples, we first need to understand the typical distribution stream of these samples, given that these programs worked in the pre-Internet era:

After you get an infected file on your system and run it, the malware will either actively search, or install system call interceptors for programs that you run. He often does this in a subtle and invisible way to avoid detection. The importance of subtlety is important because to distribute this malware, you must either transfer it to another system using media (floppy disks), or download to another distribution point, such as a BBS.

At runtime, the malware has two options; it can either remain hidden and infect new files, or display the payload.
Some payloads are pretty beautiful! The example below uses unusual functions, such as 256 colors:

Or this one, which plays with your screen buffer:


However, for the most part the malware will be silent and try to find files to infect. Infecting most files is very simple, for example, if you view a COM file as a long machine code tape:

Then “all you have to do” is to insert the JMP at the beginning of the program and add data to the end of the program. It will look something like this:

Some code was smarter and found “empty space” in a binary file and wrote itself down there, which prevented an increase in the size of the binary file, which probably meant that the antivirus could use a red flag.

However, earlier, I also mentioned intercepting system calls. Although the MS-DOS runtime is very simple and practically unprotected (you can trivially load Linux from a COM file). It still contains the full API so that applications do not need to have their own file system implementation. Here are some of the syscalls functions:

They work by causing a software interrupt, in which the program will ask the processor to move to another section of the system memory to process something:

However, MS-DOS also offers the ability to add / change these calls (using another call), allowing you to expand the system so that new drivers can load at runtime. However, this is also an ideal place to add malware interceptions:

It was a well-used trick, since you could intercept the “Open File” call and then use it to detect new executable files on the system ... and infect them.
As a quick example of how they are used, let's take a look at the simple “Hello World” program:

As we can see, there are two type calls here
int
. We use 21h
(h = hex) as the main system call number, and we can specify what action we want MS-DOS to perform, based on the valueAh

In this case, the program makes a call to print the string, and then quits with a return code of 0 (unspecified).
As mentioned earlier. When you call int 21h, the CPU will look in the IVT table where to go, inside this handler there is often a segment like a router that routes various basic calls; in the case of Int 21h, it routes to different functions based on the value ah. As soon as we get to the place, the actual call handler will deal with the task, then it will launch iret to return to the execution of the main program, often leaving behind registers of the call results:

So. If we want to see all the system calls that the program started, we can set a breakpoint at the beginning of the interrupt handler and check what the ah value is:

We do this because the interrupt handler is always in a fixed place in MS-DOS (this is much earlier than the ASLR and Kernel ASLR), but the location of the program is not.

As soon as we launch it, we will be able to see the challenges made by this pattern. While we can see on the screen that he only printed a Goat file notification (Goat is a file intended to infect, like a sacrificial goat). We also see that this program does more than just type a string. It checks the version of DOS (probably to check compatibility), and then opens, reads and writes data!

It is interesting! But we would like to know more about what the system calls are in red, since they should have input for things like file names and data to write to / output to the screen.
To do this, we need to look at other registers during syscall:

Using the “Print String” as a simple example, we can see what the usage looks like:

What is DS: DX? Why are there two registers, and how do we get data from them?
To do this, we need to understand a little more about the 8086

processor . The 8086 processor is a 16-bit CPU, but with 20-bit memory addressing. This means that the processor can only store values that indicate 64 KB, this is a problem when the memory capacity is up to 1 MB.
To get around this, we need to understand the segmentation registers:

The 8086 processor has 4 segmentation registers, which we need to take care of:
- CS - code segment
- DS - data segment
- SS - stack segment
- ES - an additional segment (in case you need another one to get around different situations)
There are a number of other general purpose registers that save you from excessive memory usage and allow you to pass parameters to other functions.
The segmentation logs the operation, changing the block in RAM:

This allows the 16-bit CPU to see all 20 bits of RAM, ensuring that for each DS value the block is shifted by 16 bytes.

In this case, the DS call is used as a pointer inside a 16-bit window as to where the beginning of the line is. Then the string printer will scan until it finds the $ symbol and then stops. This is similar to other systems that use zero byte instead of $.

With age ISA x86, little has changed, instead of the fact that the size of the processor bits has grown, the same registers have become wider.
So, with this knowledge, we can create a list of “tasks” to track these programs:

With this setting, we can throw several large computers to a problem for several hours and collect the results!

And we get ...

Nothing like that.
It is disappointing.
We've been burned at least a couple of activations! (Xs, how to translate it)

If we look at some samples, we will see a smoking gun here. A decent piece of samples checks the date or time.
If we look at the documentation for these calls, we see that the system call returns values in the form of registers for the program:

So we can brute-force them! All we need to do is something like this:

But there is one problem with this method.

The sample testing phase takes about 15 seconds, because it uses the full qemu emulation process, and it can take up to 15 seconds to fully launch the program in the virtual machine. Since DOS does not have power saving features, this means that when DOS is in standby mode, it is in a busy cycle.
Thus, we could look at this problem differently by looking at what code will be executed after the date / time request.
Since our tracer is in the interrupt handler, we do not know from the box where the program is located:

For this we need to look at the stack, where the CS and IP registers are waiting for us!

As soon as we take these two registers off the stack, we can use them to get the return code so that our checklist looks like this:

After we have done this and repeated the test of the data set, we will see what part of the return code looks like!

Here is a sample of one. Here we see that a comparison is made for DL and 0x1e.

If we look at our documentation, we see that DL is the day of the month, that is, we can analyze the top three opcodes as follows:

We could go and manually review all of this, but there are a lot of these samples that check the time around 4700:

So instead, we need to do something else. We need to write something ... We need to write ...

The world's worst x86 emulator, called the BenX86, is an emulator designed specifically for our needs, and nothing more:

But it has some advantages in its speed.


We added 10 thousand different execution tests based on the paths we found. using brute force using BenX86. So, I’ll finish with some of my favorite discoveries that are activated by time:

This pattern is activated on the day of the new year and hangs up your system after the greeting is displayed. It can be good if you are stuck in the office for the new year, or it can be bad if you really need to do something on New Year's Day.

This example surprised me a lot. It is activated in early 1995 and informs the user about all infected files that it has infected, and then removes the virus (removing the transition at the beginning), and then does nothing more. Although for some reason it says that you have to buy McAfee, it’s obvious that this message is not out of date.

This, to be honest, really confuses me, on November 8 of any year, it will turn all 0 in the system into tiny “hate” glyphs. It really confuses me, if you know why you need it, let me know ...

This is probably my nightmare, when after running any program, this is a message saying that it could not eat your main disk. It would be incredibly disturbing to see out of the blue.

Finishing, we have that there is a Navy Seal Copypasta version of malware for DOS. Not sure that this author dislikes Aladdin, but whatever you do, you are a human.
If you are interested in the code that runs in this article, I released my toolkit on github , without any guarantees. If you want to create this code yourself, you will need to work to make sure that it works with your MS-DOS installation (fix handler breakpoint)
However, if you are just looking to see what I saw while looking at this project I archived the web interface here: dosv.benjojo.co.uk
See you soon!