podkolzzzin January 3, 2017 at 17:23

Search for GDI Object Leaks: How to Drive a Mastodon

Transfer

Strictly speaking, this is the original text of the article, and the blog has a translation. Here the article is published a little later and only because it receives the tag translation.

In 2016, when most programs are executed in sandboxes, of which even the most incompetent developer will not be able to harm the system, it is strange to encounter a problem, which will be discussed later. To be honest, I hoped that she had gone into the distant past with Win32Api, but I recently ran into it. Before that, I just heard the creepy stories of ~~older,~~ more experienced developers, what could be.

Problem

Leaking or using too many GDI objects.

Symptoms

In the Task Manager, on the Details tab, the GDI objects column displays threatening 10,000 (If this column is not there, you can add it by right-clicking on the table heading and selecting Select Columns)
When developing in C # or another language executed by the CLR, an exception that flashes no specifics will fly:
Message: A generic error occurred in GDI +.
Source: System.Drawing
TargetSite: IntPtr GetHbitmap (System.Drawing.Color)
Type: System.Runtime.InteropServices.ExternalException

Also, with certain settings or the system version, there may not be an exception, but your application will not be able to draw a single object.
When developing in C / C ++, all GDI methods like Create% SOME_GDI_OBJECT% began to return NULL

Why?

On systems of the Windows family, no more than 65535 GDI objects can be created at a time. The number, in fact, is incredibly large and should not be reached in any normal scenario. The process has a limit of 10000, which, although it can be changed (in the registry, change the value of HKEY_LOCAL_MACHINE \ SOFTWARE \ Microsoft \ Windows NT \ CurrentVersion \ Windows \ GDIProcessHandleQuota from 256 to 65535), but Microsoft strongly does not recommend increasing this limit. If this is done, then one process will have the opportunity to put the system in such a way that it will not even be able to draw an error message. In this case, the system can only come to life after a reboot.

How to fix it?

If you live in a neat CLR-driven world, then the probability is 9 out of 10 that you have a normal memory leak in your application. Although the problem is unpleasant, it is quite mundane and there are at least a dozen excellent tools for finding it. I will not dwell on this in detail. You just need to use any profiler to see if the number of wrapper objects over GDI resources is increasing, these are: Brush, Bitmap, Pen, Region, Graphics. If this is true, then you are lucky, you can close the tab with the article.

If there was no leak of wrapper objects, then you have in your code a direct use of the GDI functions and a script in which they are not deleted.

What will others advise you?

An official guide from Microsoft or other articles on this subject that you will find on the Internet will advise something like the following:

Find all Create % SOME_GDI_OBJECT% and find out if there is a corresponding DeleteObject (or ReleaseDC for HDC objects), and if so, then perhaps there is a scenario in which it will not be called.

There is a slightly improved version of this method, it contains an additional first step:

Download the GDIView utility . She can show a specific number of GDI objects by type and the only thing that is alarming is that the sum of all does not match the value in the last column. You can try not to pay attention to this if it helps to narrow the search area somehow.

The project I'm working on has a code base of over 9 million lines and about the same number in third-party libraries, hundreds of calls to GDI functions, spread over dozens of files. I spent a lot of energy and coffee before I realized that manually it is simply impossible to analyze this without missing anything.

What will I offer?

If this method seems to you too long and requiring extra body movements, then you have not yet gone through all the stages of despair with the previous one. You can try the previous steps a few more times, but if that doesn’t help, then do not discount this option.

In search of a leak, I wondered: “And where are those objects that leak?” It was absolutely impossible to set breakpoints in all places where API functions were called. In addition, there was no complete certainty that this did not happen in the .net framework or one of the third-party libraries that we use. A few minutes of googling led me to the Api Monitor utility, which allows you to log and debug calls of any system functions. I easily found a list of all the functions that generate GDI objects, honestly found them and selected them in Api Monitor, and then set breakpoints.

After that, he started the process for debugging in Visual Studio , and here he selected it in the process tree. The first breakpoint worked instantly: There

were too many challenges. I quickly realized that I would choke on this stream and I need to come up with something else. I removed the breakpoints from the functions and decided to look at the log. These were thousands and thousands of challenges. It became obvious that they could not be analyzed manually.

Task:Find those calls to GDI functions that do not correspond to deletion. The logs contain everything you need: a list of function calls in chronological order, their return values and parameters. It turns out that I need to take the return value of the Create% SOME_GDI_OBJECT% function and find the DeleteObject call with this value as an argument. I selected all the entries in Api Monitor, pasted it into a text file and got something like a CSV with a TAB delimiter. I started VS, where I was thinking of writing a program to parse this, but before it loaded, it came up with a better idea: export data to the database and write a query to get out what interests me. It was the right choice, because it allowed me to ask questions very quickly and get answers to them.

There are many tools to import data from CSV into the database, so I will not stop there ( mysql , mssql , sqlite ).

I got the following table:

-- mysql code
CREATE TABLE apicalls (
  id int(11) DEFAULT NULL,
  `Time of Day` datetime DEFAULT NULL,
  Thread int(11) DEFAULT NULL,
  Module varchar(50) DEFAULT NULL,
  API varchar(200) DEFAULT NULL,
  `Return Value` varchar(50) DEFAULT NULL,
  Error varchar(100) DEFAULT NULL,
  Duration varchar(50) DEFAULT NULL
)

I wrote a mysql function to get the handle of the deleted object from the api call:

CREATE FUNCTION getHandle(api varchar(1000))
  RETURNS varchar(100) CHARSET utf8
BEGIN
DECLARE start int(11);
DECLARE result varchar(100);
SET start := INSTR(api,','); -- for ReleaseDC where HDC is second parameter. ex: 'ReleaseDC ( 0x0000000000010010, 0xffffffffd0010edf )'
IF start = 0 THEN
  SET start := INSTR(api, '(');
END IF;
SET result := SUBSTRING_INDEX(SUBSTR(api, start + 1), ')', 1);
RETURN TRIM(result);
END

Finally, a query that finds all current objects:

SELECT creates.id, creates.handle chandle, creates.API, dels.API deletedApi
FROM (SELECT a.id, a.`Return Value` handle, a.API FROM apicalls a WHERE a.API LIKE 'Create%') creates
  LEFT JOIN (SELECT
      d.id,
      d.API,
      getHandle(d.API) handle
    FROM apicalls d
    WHERE API LIKE 'DeleteObject%'
    OR API LIKE 'ReleaseDC%' LIMIT 0, 100) dels
    ON dels.handle = creates.handle
WHERE creates.API LIKE 'Create%';

(Strictly speaking, he will simply find all Delete calls to all Create calls)

In the figure, you can immediately see calls to which no Delete has been found.

The last question remains: How can I find where these methods are called in the context of my code? And here one tricky trick helped me:

Run the application for debugging in VS.
Find it in Api Monitor and select.
Select the desired Api function and set a breakpoint.
Patiently click “Next” until it is called with parameters of interest. (How did the conditional breakpoints from vs lack
When you reach the desired call, go to VS and click break all.
The VS debugger will be stopped at the place where the leaking object is created and it remains only to find out why it is not deleted.

(The code is written solely as an example)

Summary:

The algorithm is long and complex, it uses a lot of tools, but it gave me a result much faster than a dumb search for errors in a huge code base.

Here it is, for those who were too lazy to read or who had already forgotten how it all began while reading:

Search for memory leaks in GDI wrapper objects
If they are, eliminate and repeat the first step.
If they are not, then look up the API calls directly.
If there are not many, then look for a scenario in which the object may not be deleted.
If there are a lot of them or it is not possible to track, then you need to download Api Monitor and configure it to log calls to GDI functions.
Run debug application in VS
Play a leak (this will initialize the program so that the cached objects do not smear their eyes in the log).
Connect Api Monitor'om.
Reproduce the leak.
Copy the log to a text file, import into any database that is at hand (scripts in the article for mysql, but easily adapt to any RDBMS)
Match Create and Delete methods (the SQL script is earlier in this article), find those that don’t call Delete
Set a breakpoint in Api Monitor to call the desired method.
Press continue until the method is called with the necessary parameters. Cry for lack of conditional breakpoints.
When the method is called with the necessary parameters, click Break All in VS.
Find why this object is not deleted.

I really hope that this article will save someone a lot of time and be useful.

Tags: