Again EA, again NFS, again bugs. Mending

    Hello, Habr! You are again speedranging the NFS community. And again we are repairing an old toy - NFS Most Wanted. I already talked about fixing bugs in my previous articles , but today I wanted to go a little deeper with you into the jungle of disassembly. Interested, I ask for cat.



    Background


    Once upon a time, when EA published good NFS , one of the most famous racing games - Most Wanted - was released. Alas, it was not written as well as it was sold, and periodically fell. Of course, an ordinary person pays little attention to this - well, flew out once for the passage, that's okay. But this creates huge problems for us: how many potential records were killed by accidental falls without distinct symptoms. It all ended with KuruHS personally asking me to sort things out. I could not refuse.

    What do we have




    IDA - for disassembling the
    Cheat Engine - for editing memory and
    Visual Studio instructions - for debugging (Trace Points turned out to be a very convenient thing)

    We have a bunch of dumps. A decent bunch, 10 gigabytes. We’ll start with them - we’ll analyze on what instructions the game falls. And it falls quite randomly, although some patterns can be traced. During problem solving, we found several potentially dangerous places that sometimes crash the game. For example:



    in a string hash calculation function. Apparently, the developers did not expect to get a null-pointer in this place, so they did not add a check for it. Because of this, in rare cases, the game crashed. The fix is ​​pretty banal - jump into the first empty piece of the executable, and do test edi, edi. Then jz retun and jmp from where jumped initially.



    Another similar case was found in the procedure at the address
    0х0057D105 mov edx, [ecx] ; я так и не смог понять, что конкретно она делает

    Developers again did not expect to get a null pointer there, so the game crashed. Fix is ​​absolutely identical to the previous one.



    The most common cause of the fall was in the AllocateMemory function. Attempts to disassemble it terrified everyone who worked on the problem of game crashes. Attention has already been given to the fact that the game has at least 5 different subsystems for managing memory. What I got into ...



    Well, there is no time to whine, it is necessary to reverse. Several evenings after parsing this garbage paid off: the code, although still not readable, became more understandable. Apparently, this subsystem works according to the standard scheme: we grab a certain amount of memory at once, breaking it into blocks, store them in a doubly linked list; upon request we give out free areas, and if there are none, we try to take more from the system. Ah, the 2005th, when memory operations were expensive enough to be scattered randomly ...



    Some places in this function cause me headaches, because my brain completely refuses to even try to process them. But one thing is clear to me - somewhere among all these linked lists, consisting of linked lists, lies the wrong pointer, because of which everything falls. The only solution that occurred to me was to disable the “use_best_fit” check so that the subsystem returned the first available free block, rather than looking for the one that it considers the most suitable.

    Of course, this did not completely solve the problem, but at least the game became really more stable - during the week of testing in this particular place it fell only a few times (taking into account that KuruHS spends 10 hours a day in the game), which I think is a pretty good result .

    Pure virtual function call.


    The same error that is illustrated in the header. People familiar with C ++ will immediately understand what the problem is. However, without source code, things get much more complicated. The situation is complicated by the CRT, which, as partisans, stubbornly does not want to generate dumps if it catches this type of error.

    Purecall means that the code tried to call a “pure virtual function” (a virtual class function that does not have an implementation). Without a doubt, he does not succeed in doing this, so the only thing he decides to do is to inform the user of this and end with code 0 . As a result, everything seems to be fine with the code, but in fact, everything is bad.

    Thanks to Microsoft for the great feature - _set_purecall_handler, which allows you to replace the purecall handler. We are looking for references / links in the file, we find the function itself. Now it remains to write your handler and do not forget to install it as a handler. To do this, we need to find a large enough piece of unused code in the file itself, which we can rewrite to our code. A short search showed that it would be the _CxxThrowException function (no links to it were found). We mercilessly record her whole body with nops and start creating on top of her:



    This is how the pseudocode of new procedures will look:

    new_handler:
    	xor	eax, eax		; return *(0);
    	mov	eax, [eax]		; моментально валит игру
    	ret
    set_handler:
    	push 	new_handler
    	call	_set_purecall_handler	; _set_purecall_handler(new_handler);
    	add	esp, 4			; cdecl, восстанавливаем стек
    	ret	

    We compile (in my case, drive it into the Cheat Engine with our hands) and paste it into the code:



    Now you need to find a suitable place to call this procedure. I didn’t find a suitable one, but I found one wonderful empty function right in the main loop of the game, so its call is a submenu for the call of the function we wrote. We make a patch and you can test it.



    The only problem is that this error is quite rare, and you do not want to play aimlessly for hours. Nevertheless, I decided to test a little myself, and was pleasantly surprised - the game fell literally after 10 minutes of gameplay, and fell on the site I just wrote. We move along the call stack a little higher:

    0043E005  call        dword ptr [edx+80h] 

    I can’t say anything except: "yes, this is a call to a virtual function." The first thought is - what if without it? We cut it out with nop, test it - like we live. The game works as it should. There are no side effects. We collect the patch, send it for testing. After a day, a dump arrives, where the same procedure falls a few bytes below. Saw her too - the game begins to fall. Everything leads to the fact that you need to think about a more serious solution. But nothing climbs into the head, so it is postponed indefinitely.

    During the night I had time to think it over, and came to the conclusion. You say that C ++ does not know how to determine the type of an object in runtime? And I will say that it can. And very simple - at the address of the virtual table in memory. After examining the dumps, I came to the conclusion that the wrong class periodically flies into the procedure (vtbl @ 0x00890970), which means we can catch this situation:

    
    	cmp	edx, 00890970h
    	jnz	good_class
    	xor	eax, eax
    	jmp	return
    good_class:
    	call	dword ptr[edx+80h]
    	jmp	continue
    

    But there is one catch: this takes up a lot of space, and this must be built into the procedure. It will not work to find enough space, all that is is a couple of empty pieces a few bytes in front of the function. Thanks already and that there are a lot of them and they are close. Therefore, we write spaghetti and jump from one place to another almost after each instruction:



    Lyrics
    Maybe I got a little excited and it was worth pushing it into the once _CxxThrowException function, since I already cleaned it. But alas, he did as he did. I'll try to remake this fix the other day.

    Patch and run. And we get the same problem: this crash is so rare that in almost 4 hours of testing this piece of code was run just a couple of times, and the correct class was received all the times.

    It could have been left that way, but I needed confirmation that this really worked. Therefore, we are going to reverse further and try to cause an exceptional situation with our hands.

    A quick inspection showed that the game could fall if one of the arguments is non-zero. The procedure itself is called in only two places, and in one of the cases it is called with the same argument set to 0. So we look at another function.



    we remove the "extra" checks to the maximum and try to forcefully call this function. We start testing and finally we get the wrong input class. We wait until the studio debugger completes the text, the game hangs and ... continues to work. Hurrah!


    The screenshot is soapy, for the recording from the stream

    Conclusion


    A solution was found - the game no longer crashes, even if something was wrong entered. This is noticeable in the screenshot above - part of the fence is missing, because the game tried to put something wrong there. What exactly is a mystery covered in darkness, but I’m sure that sooner or later we will find out.

    In general, the situation really improved noticeably - KuruHS was able to fully spend about 20 hours in the game without a single drop, which earlier would have been simply impossible.

    I decided to fix the whole fix in the form of an asi script, according to the principle of Widescreen patches from ThirteenAG. You can read the sources and download scripts on the github .

    Thanks for attention!

    Also popular now: