
Again EA, again NFS, again bugs. Mending
Hello, Habr! You are again speedranging the NFS community. And again we are repairing an old toy - NFS Most Wanted. I already talked about fixing bugs in my previous articles , but today I wanted to go a little deeper with you into the jungle of disassembly. Interested, I ask for cat.

Once upon a time,when EA published good NFS , one of the most famous racing games - Most Wanted - was released. Alas, it was not written as well as it was sold, and periodically fell. Of course, an ordinary person pays little attention to this - well, flew out once for the passage, that's okay. But this creates huge problems for us: how many potential records were killed by accidental falls without distinct symptoms. It all ended with KuruHS personally asking me to sort things out. I could not refuse.

IDA - for disassembling the
Cheat Engine - for editing memory and
Visual Studio instructions - for debugging (Trace Points turned out to be a very convenient thing)
We have a bunch of dumps. A decent bunch, 10 gigabytes. We’ll start with them - we’ll analyze on what instructions the game falls. And it falls quite randomly, although some patterns can be traced. During problem solving, we found several potentially dangerous places that sometimes crash the game. For example:

in a string hash calculation function. Apparently, the developers did not expect to get a null-pointer in this place, so they did not add a check for it. Because of this, in rare cases, the game crashed. The fix is pretty banal - jump into the first empty piece of the executable, and do test edi, edi. Then jz retun and jmp from where jumped initially.

Another similar case was found in the procedure at the address
Developers again did not expect to get a null pointer there, so the game crashed. Fix is absolutely identical to the previous one.

The most common cause of the fall was in the AllocateMemory function. Attempts to disassemble it terrified everyone who worked on the problem of game crashes. Attention has already been given to the fact that the game has at least 5 different subsystems for managing memory. What I got into ...

Well, there is no time to whine, it is necessary to reverse. Several evenings after parsing this garbage paid off: the code, although still not readable, became more understandable. Apparently, this subsystem works according to the standard scheme: we grab a certain amount of memory at once, breaking it into blocks, store them in a doubly linked list; upon request we give out free areas, and if there are none, we try to take more from the system. Ah, the 2005th, when memory operations were expensive enough to be scattered randomly ...

Some places in this function cause me headaches, because my brain completely refuses to even try to process them. But one thing is clear to me - somewhere among all these linked lists, consisting of linked lists, lies the wrong pointer, because of which everything falls. The only solution that occurred to me was to disable the “use_best_fit” check so that the subsystem returned the first available free block, rather than looking for the one that it considers the most suitable.
Of course, this did not completely solve the problem, but at least the game became really more stable - during the week of testing in this particular place it fell only a few times (taking into account that KuruHS spends 10 hours a day in the game), which I think is a pretty good result .
The same error that is illustrated in the header. People familiar with C ++ will immediately understand what the problem is. However, without source code, things get much more complicated. The situation is complicated by the CRT, which, as partisans, stubbornly does not want to generate dumps if it catches this type of error.
Purecall means that the code tried to call a “pure virtual function” (a virtual class function that does not have an implementation). Without a doubt, he does not succeed in doing this, so the only thing he decides to do is to inform the user of this and end with code 0 . As a result, everything seems to be fine with the code, but in fact, everything is bad.
Thanks to Microsoft for the great feature - _set_purecall_handler, which allows you to replace the purecall handler. We are looking for references / links in the file, we find the function itself. Now it remains to write your handler and do not forget to install it as a handler. To do this, we need to find a large enough piece of unused code in the file itself, which we can rewrite to our code. A short search showed that it would be the _CxxThrowException function (no links to it were found). We mercilessly record her whole body with nops and start creating on top of her:

This is how the pseudocode of new procedures will look:
We compile (in my case, drive it into the Cheat Engine with our hands) and paste it into the code:

Now you need to find a suitable place to call this procedure. I didn’t find a suitable one, but I found one wonderful empty function right in the main loop of the game, so its call is a submenu for the call of the function we wrote. We make a patch and you can test it.

The only problem is that this error is quite rare, and you do not want to play aimlessly for hours. Nevertheless, I decided to test a little myself, and was pleasantly surprised - the game fell literally after 10 minutes of gameplay, and fell on the site I just wrote. We move along the call stack a little higher:
I can’t say anything except: "yes, this is a call to a virtual function." The first thought is - what if without it? We cut it out with nop, test it - like we live. The game works as it should. There are no side effects. We collect the patch, send it for testing. After a day, a dump arrives, where the same procedure falls a few bytes below. Saw her too - the game begins to fall. Everything leads to the fact that you need to think about a more serious solution. But nothing climbs into the head, so it is postponed indefinitely.
During the night I had time to think it over, and came to the conclusion. You say that C ++ does not know how to determine the type of an object in runtime? And I will say that it can. And very simple - at the address of the virtual table in memory. After examining the dumps, I came to the conclusion that the wrong class periodically flies into the procedure (vtbl @ 0x00890970), which means we can catch this situation:
But there is one catch: this takes up a lot of space, and this must be built into the procedure. It will not work to find enough space, all that is is a couple of empty pieces a few bytes in front of the function. Thanks already and that there are a lot of them and they are close. Therefore, we write spaghetti and jump from one place to another almost after each instruction:

Patch and run. And we get the same problem: this crash is so rare that in almost 4 hours of testing this piece of code was run just a couple of times, and the correct class was received all the times.
It could have been left that way, but I needed confirmation that this really worked. Therefore, we are going to reverse further and try to cause an exceptional situation with our hands.
A quick inspection showed that the game could fall if one of the arguments is non-zero. The procedure itself is called in only two places, and in one of the cases it is called with the same argument set to 0. So we look at another function.

we remove the "extra" checks to the maximum and try to forcefully call this function. We start testing and finally we get the wrong input class. We wait until the studio debugger completes the text, the game hangs and ... continues to work. Hurrah!

The screenshot is soapy, for the recording from the stream
A solution was found - the game no longer crashes, even if something was wrong entered. This is noticeable in the screenshot above - part of the fence is missing, because the game tried to put something wrong there. What exactly is a mystery covered in darkness, but I’m sure that sooner or later we will find out.
In general, the situation really improved noticeably - KuruHS was able to fully spend about 20 hours in the game without a single drop, which earlier would have been simply impossible.
I decided to fix the whole fix in the form of an asi script, according to the principle of Widescreen patches from ThirteenAG. You can read the sources and download scripts on the github .
Thanks for attention!

Background
Once upon a time,
What do we have

IDA - for disassembling the
Cheat Engine - for editing memory and
Visual Studio instructions - for debugging (Trace Points turned out to be a very convenient thing)
We have a bunch of dumps. A decent bunch, 10 gigabytes. We’ll start with them - we’ll analyze on what instructions the game falls. And it falls quite randomly, although some patterns can be traced. During problem solving, we found several potentially dangerous places that sometimes crash the game. For example:

in a string hash calculation function. Apparently, the developers did not expect to get a null-pointer in this place, so they did not add a check for it. Because of this, in rare cases, the game crashed. The fix is pretty banal - jump into the first empty piece of the executable, and do test edi, edi. Then jz retun and jmp from where jumped initially.

Another similar case was found in the procedure at the address
0х0057D105 mov edx, [ecx] ; я так и не смог понять, что конкретно она делает
Developers again did not expect to get a null pointer there, so the game crashed. Fix is absolutely identical to the previous one.

The most common cause of the fall was in the AllocateMemory function. Attempts to disassemble it terrified everyone who worked on the problem of game crashes. Attention has already been given to the fact that the game has at least 5 different subsystems for managing memory. What I got into ...

Well, there is no time to whine, it is necessary to reverse. Several evenings after parsing this garbage paid off: the code, although still not readable, became more understandable. Apparently, this subsystem works according to the standard scheme: we grab a certain amount of memory at once, breaking it into blocks, store them in a doubly linked list; upon request we give out free areas, and if there are none, we try to take more from the system. Ah, the 2005th, when memory operations were expensive enough to be scattered randomly ...

Some places in this function cause me headaches, because my brain completely refuses to even try to process them. But one thing is clear to me - somewhere among all these linked lists, consisting of linked lists, lies the wrong pointer, because of which everything falls. The only solution that occurred to me was to disable the “use_best_fit” check so that the subsystem returned the first available free block, rather than looking for the one that it considers the most suitable.
Of course, this did not completely solve the problem, but at least the game became really more stable - during the week of testing in this particular place it fell only a few times (taking into account that KuruHS spends 10 hours a day in the game), which I think is a pretty good result .
Pure virtual function call.
The same error that is illustrated in the header. People familiar with C ++ will immediately understand what the problem is. However, without source code, things get much more complicated. The situation is complicated by the CRT, which, as partisans, stubbornly does not want to generate dumps if it catches this type of error.
Purecall means that the code tried to call a “pure virtual function” (a virtual class function that does not have an implementation). Without a doubt, he does not succeed in doing this, so the only thing he decides to do is to inform the user of this and end with code 0 . As a result, everything seems to be fine with the code, but in fact, everything is bad.
Thanks to Microsoft for the great feature - _set_purecall_handler, which allows you to replace the purecall handler. We are looking for references / links in the file, we find the function itself. Now it remains to write your handler and do not forget to install it as a handler. To do this, we need to find a large enough piece of unused code in the file itself, which we can rewrite to our code. A short search showed that it would be the _CxxThrowException function (no links to it were found). We mercilessly record her whole body with nops and start creating on top of her:

This is how the pseudocode of new procedures will look:
new_handler:
xor eax, eax ; return *(0);
mov eax, [eax] ; моментально валит игру
ret
set_handler:
push new_handler
call _set_purecall_handler ; _set_purecall_handler(new_handler);
add esp, 4 ; cdecl, восстанавливаем стек
ret
We compile (in my case, drive it into the Cheat Engine with our hands) and paste it into the code:

Now you need to find a suitable place to call this procedure. I didn’t find a suitable one, but I found one wonderful empty function right in the main loop of the game, so its call is a submenu for the call of the function we wrote. We make a patch and you can test it.

The only problem is that this error is quite rare, and you do not want to play aimlessly for hours. Nevertheless, I decided to test a little myself, and was pleasantly surprised - the game fell literally after 10 minutes of gameplay, and fell on the site I just wrote. We move along the call stack a little higher:
0043E005 call dword ptr [edx+80h]
I can’t say anything except: "yes, this is a call to a virtual function." The first thought is - what if without it? We cut it out with nop, test it - like we live. The game works as it should. There are no side effects. We collect the patch, send it for testing. After a day, a dump arrives, where the same procedure falls a few bytes below. Saw her too - the game begins to fall. Everything leads to the fact that you need to think about a more serious solution. But nothing climbs into the head, so it is postponed indefinitely.
During the night I had time to think it over, and came to the conclusion. You say that C ++ does not know how to determine the type of an object in runtime? And I will say that it can. And very simple - at the address of the virtual table in memory. After examining the dumps, I came to the conclusion that the wrong class periodically flies into the procedure (vtbl @ 0x00890970), which means we can catch this situation:
cmp edx, 00890970h
jnz good_class
xor eax, eax
jmp return
good_class:
call dword ptr[edx+80h]
jmp continue
But there is one catch: this takes up a lot of space, and this must be built into the procedure. It will not work to find enough space, all that is is a couple of empty pieces a few bytes in front of the function. Thanks already and that there are a lot of them and they are close. Therefore, we write spaghetti and jump from one place to another almost after each instruction:

Lyrics
Maybe I got a little excited and it was worth pushing it into the once _CxxThrowException function, since I already cleaned it. But alas, he did as he did. I'll try to remake this fix the other day.
Patch and run. And we get the same problem: this crash is so rare that in almost 4 hours of testing this piece of code was run just a couple of times, and the correct class was received all the times.
It could have been left that way, but I needed confirmation that this really worked. Therefore, we are going to reverse further and try to cause an exceptional situation with our hands.
A quick inspection showed that the game could fall if one of the arguments is non-zero. The procedure itself is called in only two places, and in one of the cases it is called with the same argument set to 0. So we look at another function.

we remove the "extra" checks to the maximum and try to forcefully call this function. We start testing and finally we get the wrong input class. We wait until the studio debugger completes the text, the game hangs and ... continues to work. Hurrah!

The screenshot is soapy, for the recording from the stream
Conclusion
A solution was found - the game no longer crashes, even if something was wrong entered. This is noticeable in the screenshot above - part of the fence is missing, because the game tried to put something wrong there. What exactly is a mystery covered in darkness, but I’m sure that sooner or later we will find out.
In general, the situation really improved noticeably - KuruHS was able to fully spend about 20 hours in the game without a single drop, which earlier would have been simply impossible.
I decided to fix the whole fix in the form of an asi script, according to the principle of Widescreen patches from ThirteenAG. You can read the sources and download scripts on the github .
Thanks for attention!