Zombies that eat your memory
- Transfer
- Tutorial
Whatever you yourself think there, zombies exist. And they really eat brains. Not human, true, but computer. I'm talking about zombie processes and the resources they consume. This will be a heartbreaking story about the lost and newly found 32 GB of RAM. Perhaps only a few of you will encounter exactly the same problem, but if this happens, you will at least have a chance to understand what is happening.
To begin with, computers running Windows tend to lose memory over time. Well, at least for me, with my way of using them. After a couple of weeks without rebooting (or, for example, just one weekend for which I rebuilt Chrome 300 times), I began to notice that the task manager was starting to show me a very small amount of free RAM, but at the same time there were no processes in the system that this memory is actively used. In the example above (with 300 assemblies of Chrome), the task manager told me that the system uses 49.8 GB plus 4.4 GB of compressed memory, but only a few processes are running, and all of them in total do not even use so much memory:
My computer has 96 GB of RAM (yes, I'm lucky) and when I don’t have any running processes at all - you know, I would like to see at least half of this memory free. I really count on it. But sometimes this cannot be achieved and I have to reboot the OS. The Windows kernel is written with high quality and reliability (no kidding), so that the memory should not disappear without a trace. But still she disappears.
My first guess was the recollection that one of my colleaguessomehow complained about zombie processes that sometimes remain in the OS no longer active, but still not completely removed by the kernel. He even wrote a special utility that displays a list of such processes - their names and number. When he ran this utility in his tests, he received up to several hundred zombie processes on a regular Windows machine. I found his tool, launched it on my computer and got ... 506,000 zombie processes. Yes, 506 thousand!
I remembered that one of the possible reasons for a process to transition to a “zombie” state might be that some other process keeps its handle open. In my case, a large number of zombie processes played into my hands - it was harder for them to hide. I simply opened the task manager and added a column with the number of open descriptors for each process to the Details tab. Then I sorted the list in descending order of values in this column. I immediately found the hero of this story - the CcmExec.exe process (part of the Microsoft System Management Server ) had 508,000 open descriptors. This was, firstly, a lot, and secondly, suspiciously close to the number I found above in 506,000 zombie processes.
I killed the CcmExec.exe process and got the following result:
Everything turned out exactly as I expected. As I wrote above without irony, the Windows kernel is written very well and when the process is destroyed, then all the resources occupied by it are freed. Closing CcmExec.exe freed up 508,000 descriptors, which made it possible to permanently close 506,000 zombie processes. The amount of free RAM instantly increased by 32 GB. The secret is revealed!
Up to this point, we have not yet figured out what caused all these processes to hang in uncertainty, and not to be deleted. It seems that we are dealing with a trivial bug in the application (and not in the kernel of the OS). The general rule is that when you create a process, you get its handle and the handle of its main thread. You MUST close these descriptors. If your task was just to start the process, you can close them immediately (this will not kill the running process, but simply break the connection of your process with it). If you need a new process for something (for example, you are waiting for it to finish working or you need a code that it will return), then you need to use the appropriate functions (for example, WaitForSingleObject (hProcess, INFINITE) to wait for the exit or GetExitCodeProcess (hProcess, & exitCode) to get the return code) and still close the descriptors after you got everything you wanted from the child process. The same thing should be done with process descriptors that you open for something using the OpenProcess () function.
If the process that forgets to do so is related to the system, then it may not even help you to log out of your account and log in again, only a complete reboot.
Another tool I used in my research was the RamMap utility . It shows the use of each page of memory. On the Process Memory tab, we see hundreds of thousands of processes, each of which occupies 32 KB of RAM - obviously, this is our zombies. But ~ 500,000 times in 32 KB each will be approximately 16 GB - where did the rest of the memory go? Comparing the state of memory before and after closing zombie processes gives an answer to this question:
We can clearly see that ~ 16 GB goes to Process Private Memory. We also see that another 16 GB falls on Page Table Memory. Obviously, each zombie process takes 32 KB in the table of memory pages and uses another 32 KB for its personal memory. I don’t know why the zombie process has so much memory, but probably no one ever thought that the number of such processes could be measured in hundreds of thousands.
Some types of used memory increased after closing the CcmExec.exe process, mainly for the Mapped File and Metafile. I don’t know exactly why it happened. One of my guesses is that the OS decided that there was now enough free memory and cached something for itself. This, in general, is not bad. I do not regret the memory for the needs of the OS, I just do not want it to disappear completely aimlessly.
Important note: RamMap also opens the descriptors of all processes, so this utility should be closed if you want to achieve the closure of zombie processes.
I tweeted about my find and the research was continued by another programmer who was able to reproduce this bug and pass information about it to a developer from Microsoft, who said that this is "a known problem that sometimes happens when a lot of processes start and close very quickly."
I hope this problem will be fixed soon.
I am working on the code for the Windows version of Chrome and one of my tasks is to optimize its assembly on this OS, and this requires multiple launches of this assembly. Each assembly of Chrome starts a huge variety of processes - from 28,000 to 37,000, depending on the selected settings. Using our distributed assembly system ( goma ), these processes are created and closed very quickly. My best Chrome build result is 200 seconds. But such an aggressive process start policy reveals problems in the Windows kernel and its components:
If you are not working on a computer controlled by company policies, then the CmmExec.exe process is not running for you and you will not encounter this particular bug. It will also touch you only if you collect Chrome or do something else similar, creating and closing tens of thousands of processes in short periods of time.
But!
CcmExec is not the only bug program in the world. I found many others that contain exactly the same type of errors that lead to the creation of zombie processes. And there are many more that I have not found.
As all experienced programmers know, any error that has not been explicitly corrected or warned is certain to happen.. Just writing in the documentation “Please close this handle” is not enough. So here is my contribution to making finding this type of error easier, and correcting them is more real. FindZombieHandles is a tool based on NtApiDotNet and code from @tiraniddo that displays a list of zombie processes and information about who made them zombies. Here is an example of the output of this utility running on my computer:
274 zombies is not so bad. But even this already indicates certain problems that can be found and corrected. The IntelCpHeciSvc.exe process on this list has the biggest problems - it looks like it opens (and forgets to close) the process handle every time I open a video in Windows Explorer.
Visual Studio forgets to close the descriptors of at least two processes, in one case it always plays. Just start building the project and wait ~ 15 minutes until the MSBuild.exe process closes. You can also set the option “set MSBUILDDISABLENODEREUSE = 1” and then MSBuild.exe will close immediately upon completion of the assembly and the lost handle will be visible immediately. Unfortunately, some scoundrel in Microsoft fixedthis problem and fix should be released in the VS 15.6 update, so hurry up to reproduce it while it still works (I hope there’s no need to explain that it was a joke and no one really is a scoundrel).
You can also use the Process Explorer program to view forgotten processes by configuring its bottom panel as shown below (note that in this case forgotten descriptors will be shown for both processes and threads):
Here are a couple of examples of bugs found (about some reported to developers, but not all):
Process descriptors are not the only type of resources that can leak in this way. For example, the “Intel® Online Connect Access service” (IntelTechnologyAccessService.exe) uses only 4 MB of RAM, but after running 30 days, it creates 27,504 descriptors. This problem can be detected using the Task Manager, I sent a bug report to the developers about it:
Using Process Explorer, I noticed that NVDisplay.Container.exe opens ~ 5000 descriptors for the \ BaseNamedObjects \ NvXDSyncStop-61F8EBFF-D414-46A7-90AE-98DD58E4BC99, creating a new descriptor every two minutes. As I understand it, they want to be super-confident that they can stop NvXDSync? Nvidia bugreport sent .
Corsair Link Service creates~ 15 descriptors per second, does not free them at all. Bagreport sent .
Adobe's Creative Cloud is losing thousands of descriptors (about 6,500 per day, I estimate). Bagreport sent .
The Razer Chroma SDK Service loses VERY many descriptors ( 150,000 per hour? ). Bagreport sent .
Surprisingly, no one had paid much attention to such bugs before. Hey, Microsoft, perhaps it’s worth collecting statistics on such cases and doing something about it? Hey Intel and Nvidia, clean your code a bit. Remember, I am watching you.
And now you can take the FindZombieHandles utility, run it on your machine and talk about your findings. You can also use Task Manager and Process Explorer in your experiments.
To begin with, computers running Windows tend to lose memory over time. Well, at least for me, with my way of using them. After a couple of weeks without rebooting (or, for example, just one weekend for which I rebuilt Chrome 300 times), I began to notice that the task manager was starting to show me a very small amount of free RAM, but at the same time there were no processes in the system that this memory is actively used. In the example above (with 300 assemblies of Chrome), the task manager told me that the system uses 49.8 GB plus 4.4 GB of compressed memory, but only a few processes are running, and all of them in total do not even use so much memory:
My computer has 96 GB of RAM (yes, I'm lucky) and when I don’t have any running processes at all - you know, I would like to see at least half of this memory free. I really count on it. But sometimes this cannot be achieved and I have to reboot the OS. The Windows kernel is written with high quality and reliability (no kidding), so that the memory should not disappear without a trace. But still she disappears.
My first guess was the recollection that one of my colleaguessomehow complained about zombie processes that sometimes remain in the OS no longer active, but still not completely removed by the kernel. He even wrote a special utility that displays a list of such processes - their names and number. When he ran this utility in his tests, he received up to several hundred zombie processes on a regular Windows machine. I found his tool, launched it on my computer and got ... 506,000 zombie processes. Yes, 506 thousand!
I remembered that one of the possible reasons for a process to transition to a “zombie” state might be that some other process keeps its handle open. In my case, a large number of zombie processes played into my hands - it was harder for them to hide. I simply opened the task manager and added a column with the number of open descriptors for each process to the Details tab. Then I sorted the list in descending order of values in this column. I immediately found the hero of this story - the CcmExec.exe process (part of the Microsoft System Management Server ) had 508,000 open descriptors. This was, firstly, a lot, and secondly, suspiciously close to the number I found above in 506,000 zombie processes.
I killed the CcmExec.exe process and got the following result:
Everything turned out exactly as I expected. As I wrote above without irony, the Windows kernel is written very well and when the process is destroyed, then all the resources occupied by it are freed. Closing CcmExec.exe freed up 508,000 descriptors, which made it possible to permanently close 506,000 zombie processes. The amount of free RAM instantly increased by 32 GB. The secret is revealed!
What is a zombie process?
Up to this point, we have not yet figured out what caused all these processes to hang in uncertainty, and not to be deleted. It seems that we are dealing with a trivial bug in the application (and not in the kernel of the OS). The general rule is that when you create a process, you get its handle and the handle of its main thread. You MUST close these descriptors. If your task was just to start the process, you can close them immediately (this will not kill the running process, but simply break the connection of your process with it). If you need a new process for something (for example, you are waiting for it to finish working or you need a code that it will return), then you need to use the appropriate functions (for example, WaitForSingleObject (hProcess, INFINITE) to wait for the exit or GetExitCodeProcess (hProcess, & exitCode) to get the return code) and still close the descriptors after you got everything you wanted from the child process. The same thing should be done with process descriptors that you open for something using the OpenProcess () function.
If the process that forgets to do so is related to the system, then it may not even help you to log out of your account and log in again, only a complete reboot.
Where does memory go?
Another tool I used in my research was the RamMap utility . It shows the use of each page of memory. On the Process Memory tab, we see hundreds of thousands of processes, each of which occupies 32 KB of RAM - obviously, this is our zombies. But ~ 500,000 times in 32 KB each will be approximately 16 GB - where did the rest of the memory go? Comparing the state of memory before and after closing zombie processes gives an answer to this question:
We can clearly see that ~ 16 GB goes to Process Private Memory. We also see that another 16 GB falls on Page Table Memory. Obviously, each zombie process takes 32 KB in the table of memory pages and uses another 32 KB for its personal memory. I don’t know why the zombie process has so much memory, but probably no one ever thought that the number of such processes could be measured in hundreds of thousands.
Some types of used memory increased after closing the CcmExec.exe process, mainly for the Mapped File and Metafile. I don’t know exactly why it happened. One of my guesses is that the OS decided that there was now enough free memory and cached something for itself. This, in general, is not bad. I do not regret the memory for the needs of the OS, I just do not want it to disappear completely aimlessly.
Important note: RamMap also opens the descriptors of all processes, so this utility should be closed if you want to achieve the closure of zombie processes.
I tweeted about my find and the research was continued by another programmer who was able to reproduce this bug and pass information about it to a developer from Microsoft, who said that this is "a known problem that sometimes happens when a lot of processes start and close very quickly."
I hope this problem will be fixed soon.
Why am I having such strange problems on my computer?
I am working on the code for the Windows version of Chrome and one of my tasks is to optimize its assembly on this OS, and this requires multiple launches of this assembly. Each assembly of Chrome starts a huge variety of processes - from 28,000 to 37,000, depending on the selected settings. Using our distributed assembly system ( goma ), these processes are created and closed very quickly. My best Chrome build result is 200 seconds. But such an aggressive process start policy reveals problems in the Windows kernel and its components:
- Quick process deletion leads to user input freezes
- The touchpad driver allocates but does not free memory each time a process is created
- App Verifier creates O (n ^ 2) log files (and you should write a separate post about it!)
- There is a bug in the Windows kernel that deals with file buffering, and this bug plays on all Windows from Server 2008 R2 to Windows 10
- Windows Defender delays the start of each goma process for 250 ms
What's next?
If you are not working on a computer controlled by company policies, then the CmmExec.exe process is not running for you and you will not encounter this particular bug. It will also touch you only if you collect Chrome or do something else similar, creating and closing tens of thousands of processes in short periods of time.
But!
CcmExec is not the only bug program in the world. I found many others that contain exactly the same type of errors that lead to the creation of zombie processes. And there are many more that I have not found.
As all experienced programmers know, any error that has not been explicitly corrected or warned is certain to happen.. Just writing in the documentation “Please close this handle” is not enough. So here is my contribution to making finding this type of error easier, and correcting them is more real. FindZombieHandles is a tool based on NtApiDotNet and code from @tiraniddo that displays a list of zombie processes and information about who made them zombies. Here is an example of the output of this utility running on my computer:
274 total zombie processes.
249 zombies held by IntelCpHeciSvc.exe(9428)
249 zombies of Video.UI.exe
14 zombies held by RuntimeBroker.exe(10784)
11 zombies of MicrosoftEdgeCP.exe
3 zombies of MicrosoftEdge.exe
8 zombies held by svchost.exe(8012)
4 zombies of ServiceHub.IdentityHost.exe
2 zombies of cmd.exe
2 zombies of vs_installerservice.exe
3 zombies held by explorer.exe(7908)
3 zombies of MicrosoftEdge.exe
1 zombie held by devenv.exe(24284)
1 zombie of MSBuild.exe
1 zombie held by SynTPEnh.exe(10220)
1 zombie of SynTPEnh.exe
1 zombie held by tphkload.exe(5068)
1 zombie of tpnumlkd.exe
1 zombie held by svchost.exe(1872)
1 zombie of userinit.exe
274 zombies is not so bad. But even this already indicates certain problems that can be found and corrected. The IntelCpHeciSvc.exe process on this list has the biggest problems - it looks like it opens (and forgets to close) the process handle every time I open a video in Windows Explorer.
Visual Studio forgets to close the descriptors of at least two processes, in one case it always plays. Just start building the project and wait ~ 15 minutes until the MSBuild.exe process closes. You can also set the option “set MSBUILDDISABLENODEREUSE = 1” and then MSBuild.exe will close immediately upon completion of the assembly and the lost handle will be visible immediately. Unfortunately, some scoundrel in Microsoft fixedthis problem and fix should be released in the VS 15.6 update, so hurry up to reproduce it while it still works (I hope there’s no need to explain that it was a joke and no one really is a scoundrel).
You can also use the Process Explorer program to view forgotten processes by configuring its bottom panel as shown below (note that in this case forgotten descriptors will be shown for both processes and threads):
Here are a couple of examples of bugs found (about some reported to developers, but not all):
- Leak in CcmExec.exe (the case with 500,000 zombies described above) - developers are working on a fix
- Leak in Program Compatibility Assistant Service - issue is being investigated
- Leak in devenv.exe + MSBuild.exe (the problem has already been fixed )
- Leak in devenv.exe + ServiceHub.Host.Node.x86.exe (bug report sent )
- Leak in IntelCpHeciSvc.exe + Video.UI.exe for each open video file (Intel accepted the bug report and forwarded it to Lenovo)
- Leak in RuntimeBroker.exe + MicrosoftEdge and Video.UI.exe (possibly related to some other bugs in RuntimeBroker.exe)
- Leak in AudioSrv + Video.UI.exe
- Leak in one internal Google tool due to using old version of psutil
- Lenovo utility leak: tphkload.exe loses one descriptor, SUService.exe loses three
- Leak in Synaptic's SynTPEnh.exe
Process descriptors are not the only type of resources that can leak in this way. For example, the “Intel® Online Connect Access service” (IntelTechnologyAccessService.exe) uses only 4 MB of RAM, but after running 30 days, it creates 27,504 descriptors. This problem can be detected using the Task Manager, I sent a bug report to the developers about it:
Using Process Explorer, I noticed that NVDisplay.Container.exe opens ~ 5000 descriptors for the \ BaseNamedObjects \ NvXDSyncStop-61F8EBFF-D414-46A7-90AE-98DD58E4BC99, creating a new descriptor every two minutes. As I understand it, they want to be super-confident that they can stop NvXDSync? Nvidia bugreport sent .
Corsair Link Service creates~ 15 descriptors per second, does not free them at all. Bagreport sent .
Adobe's Creative Cloud is losing thousands of descriptors (about 6,500 per day, I estimate). Bagreport sent .
The Razer Chroma SDK Service loses VERY many descriptors ( 150,000 per hour? ). Bagreport sent .
Surprisingly, no one had paid much attention to such bugs before. Hey, Microsoft, perhaps it’s worth collecting statistics on such cases and doing something about it? Hey Intel and Nvidia, clean your code a bit. Remember, I am watching you.
And now you can take the FindZombieHandles utility, run it on your machine and talk about your findings. You can also use Task Manager and Process Explorer in your experiments.