
MTricks Looting Crown Android Game Optimization for Intel Atom Platform
- Transfer
The most popular category of mobile applications is games. In the old days, the capabilities of processors and graphics accelerators of portable devices were very limited, which affected their performance. As a result, most games had to be made quite simple. Today, the computing power of smartphones and tablets has grown significantly, which means that it has become possible to create high-quality, resource-intensive games. However, mobile CPUs and GPUs are still inferior to those installed in personal computers.
The growth in the mobile app market has led many PC game makers to create games for mobile platforms. However, traditional game design approaches do not work well in a mobile environment. The same goes for the graphics resources of PC games that are too “heavy” for mobile hardware. In this article, you will learn how to analyze and improve the performance of a mobile game and how to optimize graphic resources for mobile platforms. We will consider all this with the example of the game mTricks Looting Crown. The IA version of the game has already been published, here is a link to it . Game mTricks Looting Crown.


MTricks has vast experience in developing computer games using various commercial engines. When planning the next project, the company's experts, given the growth in the productivity of mobile devices, predicted the readiness of the mobile market for complex MMORPGs. As a result, the company aimed its new project on mobile platforms, and not on ordinary computers.
To begin with, mTricks transferred its PC-developments to Android. However, the performance of the solution left much to be desired. In particular, one of the hardware platforms that were oriented during development was devices based on Intel Atom processors (Bay Trail).
MTricks has run into two problems that typically come up with PC developers moving on to creating mobile applications.
Looting Crown is the so-called SNRPG (Social Network + RPG), that is, a role-playing game with the functions of a social network. It supports three-dimensional graphics and various multiplayer game modes (PvP, PvE, Clan vs Clan). In developing and optimizing the game, mTricks used a reference device built on the Bay Trail platform. The following are the technical specifications of this device.
Device Specifications and Test Results
When developing the game, mTricks used Intel Graphics Performance Analyzers (Intel GPA) to find bottlenecks in the CPU and GPU. The results of the analysis were used to solve problems with graphic resources and performance.
The reference point at which optimization and performance analysis began was 23 frames per second (FPS, Frame per Second). Below are shown the characteristics of loading the graphics core (GPU Busy) and statistics on the processor load that the application generates (Target App CPU Load). Data obtained in 2 minutes of the application. The average load of the graphics core was 91%. The processor load was about 27%.

The reference point for measuring processor utilization and the graphics core. Data retrieved using Intel GPA System Analyzer.
There are two ways to find out what exactly is the bottleneck of the system: CPU or GPU. One of them is to use override modes. The second involves changing the CPU clock speed.
One of the workarounds provided in the Intel GPA System Analyzer is Disable Draw Calls mode (disable drawing functions). It helps to understand if the processor or video core is the bottleneck of the system. After testing in this mode, it is necessary to compare the results obtained with those obtained in the standard state of the system. The following table will help interpret these results.
How to analyze games using the Disable Draw Calls workaround
Intel GPA System Analyzer allows you to explore application performance with various CPU settings. This is very useful in finding bottlenecks. In order to determine whether the game’s performance is tied to the CPU, you need to do the following:

Modification of the processor frequency in the Platform Settings panel.
The following table shows the results of experiments with Looting Crown. In Disable Draw Calls mode, the frame rate does not change. This allows us to conclude that the performance of the game is tied to the CPU. However, when setting the maximum possible processor frequency (Highest CPU Frequency mode), the FPS also does not change. This, in turn, suggests that the performance of Looting Crown is tied to the video core. In order to resolve this contradiction, we need to return to the reference measurement point, to the data on the processor load and video core, which we presented above. Namely, we see that on the reference device, built on the Bay Trail platform, the video core is loaded at 91%, and the processor at 27%. As a result, the processor cannot reach its full potential as long as the video chip runs to the limit. GPU, in our case, and is the bottleneck of the system. Therefore, for starters, we will focus on optimizing the use of the video core, after which we will repeat the testing.
FPS measurement results in various modes
As already mentioned, we found out that the bottleneck of the game is in the GPU. Let's analyze the situation using the Intel GPA Frame Analyzer. Here is what the frame information for the reference measurement point looks like.

View data using Intel GPA Frame Analyzer.
We reduced the number of calls to the drawing function by combining hundreds of static grids into one and applying larger textures.
Consider the indicators before optimization and after. For measurements, we use a unit of measure called erg. Erg is a team that does some work on the graphics core during frame output. For example, ergs are functions of drawing, cleaning, and other calls to the graphic APIs.
Reference Point Indicators
Estimating the cost of image output for a reference measurement point
The total withdrawal time of the “Terrain” is 20 ms, while on the “Grass”, which this “Terrain” has overgrown with, is 18 ms. This is about 90% of the time that is required to process the "Earth". Therefore, we continue the analysis in order to understand why the conclusion of "Grass" requires such a serious investment of time.

The process of building a "terrain".

Grass texture.
Looting Crown fills most of the “earth” with small squares of “grass”. As a result, the number of calls to the drawing function in the "Terrain" column (in the table below) is 960. The time to draw one such fragment is very short. However, the output of all fragments overloads the system. A relatively simple operation consumes an unreasonably large amount of resources. Therefore, we focused on the need to reduce the number of calls to the drawing function by combining several hundred static grids into one. In addition, we decided to use a larger texture. Here's what happened after this optimization:
Comparison of the cost of image output when using small and large textures

Changed "Terrain".
The conclusion of the "terrain", consisting of small textures, requires a large number of calls to the drawing function. Therefore, we reduced the number of such calls and saved 12 milliseconds on the output of "grass".
Here are the results obtained when applying the large-sized texture to Grass.
Performance data from the first optimization
Estimating the cost of image output after the first optimization
After the first pass of optimization, we again checked whether the performance of the game is tied to the GPU. The same measurements were performed as before - in the disable modes of the drawing functions (Disable Draw Calls) and the highest processor frequency (Highest CPU Frequency).
FPS measurement results after the first optimization
The table shows that when you turn off the drawing functions, FPS grows, and when you set the maximum processor frequency, it does not change. This suggests that the performance of Looting Crown is still tied to the graphics core. In addition, we again checked the load that the game creates on the CPU and GPU.

CPU and graphics core load after the first optimization. Data retrieved using Intel GPA System Analyzer.
Here you can see that on a reference system built on the Bay Trail platform, the processor load is about 13%, but the graphics core is loaded at 99%. Thus, optimizations regarding the use of the central processor will not increase the performance of the game until we deal with overloading the video core.
Looting Crown was originally created for the PC. Its graphic resources are not suitable for mobile devices that have less powerful CPUs and GPUs than desktop devices. Therefore, we subjected the graphic resources to a series of optimizations.
1. Minimization of calls of the function of drawing (Draw Calls)
From 10 to 2 the quantity of materials of objects is reduced.
Reduced number of particle layers.
2. Reduced the number of polygons
Using the Simplygon tool reduced the level of detail (LOD, Level Of Details) of the characters.

Character at different stages of reducing the level of detail.
Reduced the number of polygons used to display the "terrain". To begin with, we reduced the detail of mountains that are far away and do not require high detail. Then they reduced the number of polygons for a flat "ground", for the simulation of which a pair of triangles is enough.
3. Used optimized lighting maps.
Disabled dynamic lighting for "Time of day."
The size of the lighting map for each of the grids, in particular for the background, is minimized.
4. The change in the state of rendering elements is minimized.
The number of materials has been reduced, which has reduced the number of state changes of rendering elements and texture changes.
5. Separated animated parts of static grids.
The Havoc engine that was used to create the game does not support updating only the area of the object that is being animated. If the object is moving only a small part, then it is still updated as a whole. In order to cope with this, we separated the moving parts (smoke, highlighted in red in the following figure) from the fixed part of the objects. The result is two separate models of game objects.

Separating animated smoke from a static grid.
When a 3D video card displays objects, a three-dimensional scene (points at which have coordinates x, y, z) is projected onto a plane (transition to the x, y coordinate model). In order to save information about the depth of each pixel (that is, data of the z coordinate), the so-called Z-buffer or depth buffer is used. If two three-dimensional objects need to be output at the same point on the plane, the GPU compares the data by the depths of these objects. The video core will overwrite the current pixel if the new object is closer to the observer than the previous one. Thus, the use of Z-buffer allows you to correctly reproduce the usual depth of space. The technique of Z-cutting (Z-culling) is that first draw those objects that are closer, they overlap the distant objects that do not have to be drawn only to later "paint over".
In Looting Crown, part of the terrain is covered by the ocean, part - by grass. Most of the ocean is behind grass, that is, these areas are hidden. However, the ocean is removed earlier than grass, which does not allow the effective use of Z-clipping. Shown below are the GPU runtime analysis data for ocean and grass. To draw the ocean, 18 ergs are required, to the grass - 19. If the grass was displayed earlier than the ocean, then since it is closer to the viewer, most of the pixels in the ocean simply would not need to be displayed. This would reduce the time the GPU takes to draw objects. After the second optimization, as shown in the corresponding figure, the GPU operating time during ocean output decreased from 6 ms. up to 0.3 ms.

The cost of drawing the ocean after the first optimization.

The cost of drawing grass after the first optimization.

The cost of drawing the ocean after the second optimization.
By taking the steps above, mTricks has optimized all graphic resources for mobile devices. At the same time, it was possible to maintain a high level of quality. The number of ergs was reduced from 1726 to 124. The number of primitives decreased from 122204 to 9525.

Changes in graphic resources.
The optimization results are shown below. So, after all the improvements, the FPS indicator changed from 23 to 60 frames per second on a device built on the Bay Trail platform.

FPS increase during optimization.
Change FPS, GPU and CPU load
After the first optimization, analysis of the game on the Bay Trail device showed that its performance is still tied to the GPU. The purpose of the second optimization was to reduce the load on the graphics core due to the optimization of graphic resources and more rational use of the Z-buffer. As a result, we managed to achieve 60 frames per second. Since Android uses Vsync, 60 FPS is the maximum achievable on this platform.
If you want to analyze your own Android application using Intel tools, you will need the appropriate tools. In particular, a full-fledged working environment for creating and analyzing Android applications can be organized using Intel INDE and Intel GPA . Intel GPA can be installed both independently and together with the installation of INDE.
When loading the Intel GPA, you need to select a file that matches your work environment and the platform on which you plan to study graphics. In our case, provided that Windows is used as the OS of the computer on which the development is performed, the Windows 7 / 8.1 (x64) Graphics Frame Analyzer for OpenGL package is selected .
Let's see how to get started analyzing applications using Intel Graphics Frame Analyzer for OpenGL. Before the analysis, you need to properly prepare the application. Namely, the following permission must be entered into the AndroidManifest.xml file:
Also in the section of this file, you need to enable the ability to debug the application:
Without such preparation, even if you install a debug version of the application on the device, Graphics Frame Analyzer will not be able to work with it. In addition, it is important that the device is detected by ADB.
After the application is installed on an Android device, you can run Graphics Frame Analyzer for OpenGL. If in the title bar of the application window you see a mobile device connected to the computer, then up to this point everything was done correctly. Now you can use the Add button, which is located in the working area of the window.
The Graphics Frame Analyzer will analyze the device and, in the Analyzable applications section, will display a list of applications that can be analyzed.

List of applications received from the device.
In our case, we use an Asus Fonepad 8 tablet with an Intel Atom Z3530 CPU. The device has Android 5.0 installed.
The list contains an example application for working with OpenGL, which is prepared as described above. The original application can be found here . In addition, you can download a project in which all the necessary settings are made.
After double-clicking on the application icon, the Capture button will appear on the left side of the program window. After clicking on this button, application data will be captured and a thumbnail corresponding to the application screen will appear on the right side of the window. Clicking on this thumbnail opens a page with a list of the results of previous tests. On this page you need to click on the desired icon - and we get into the window where the results of the analysis are presented.

Analysis Results Window.
When starting to optimize the game, first identify bottlenecks. Intel GPA is able to help in this matter. It gives the developer powerful analytic tools. If game performance is tied to a CPU, valuable information can be obtained using Intel VTune Amplifier. If the performance of the game "rests" on the capabilities of the GPU, you can look for bottlenecks using the Intel GPA.
In order to fix the problems of the game, tied to the GPU, it’s worth looking for effective ways to reduce the calls to the drawing function, the number of polygons, and changing the state of the rendering elements. In addition, you can check the size of terrain textures, animation of objects, lightmaps, find out whether work with the Z-buffer is organized correctly.
The growth in the mobile app market has led many PC game makers to create games for mobile platforms. However, traditional game design approaches do not work well in a mobile environment. The same goes for the graphics resources of PC games that are too “heavy” for mobile hardware. In this article, you will learn how to analyze and improve the performance of a mobile game and how to optimize graphic resources for mobile platforms. We will consider all this with the example of the game mTricks Looting Crown. The IA version of the game has already been published, here is a link to it . Game mTricks Looting Crown.


1. Preliminary information
MTricks has vast experience in developing computer games using various commercial engines. When planning the next project, the company's experts, given the growth in the productivity of mobile devices, predicted the readiness of the mobile market for complex MMORPGs. As a result, the company aimed its new project on mobile platforms, and not on ordinary computers.
To begin with, mTricks transferred its PC-developments to Android. However, the performance of the solution left much to be desired. In particular, one of the hardware platforms that were oriented during development was devices based on Intel Atom processors (Bay Trail).
MTricks has run into two problems that typically come up with PC developers moving on to creating mobile applications.
- PC traditional graphics resources and design approaches are not suitable for mobile applications. The thing is that mobile processors and video cards are still lagging behind similar components of personal computers.
- Mobile devices are based on a wide range of components that have different characteristics. This is the computational capabilities of the CPU and GPU, and the amount of RAM, and screen size. As a result, various resources are available to the game on various target platforms, which affects the appearance and performance of the application.
2. The main provisions
Looting Crown is the so-called SNRPG (Social Network + RPG), that is, a role-playing game with the functions of a social network. It supports three-dimensional graphics and various multiplayer game modes (PvP, PvE, Clan vs Clan). In developing and optimizing the game, mTricks used a reference device built on the Bay Trail platform. The following are the technical specifications of this device.
Device Specifications and Test Results
Indicator | Characteristic of the device (screen diagonal - 10 inches) |
CPU | Intel Atom Quad Core 1.46 Ghz |
RAM | 2gb |
Screen resolution | 2560 x 1440 |
Points in the 3DMark ICE Storm Unlimited test | 15094 |
Graphics test | 13928 |
Physical effects test | 21348 |
When developing the game, mTricks used Intel Graphics Performance Analyzers (Intel GPA) to find bottlenecks in the CPU and GPU. The results of the analysis were used to solve problems with graphic resources and performance.
The reference point at which optimization and performance analysis began was 23 frames per second (FPS, Frame per Second). Below are shown the characteristics of loading the graphics core (GPU Busy) and statistics on the processor load that the application generates (Target App CPU Load). Data obtained in 2 minutes of the application. The average load of the graphics core was 91%. The processor load was about 27%.

The reference point for measuring processor utilization and the graphics core. Data retrieved using Intel GPA System Analyzer.
3. Who is to blame, the processor or the video core?
There are two ways to find out what exactly is the bottleneck of the system: CPU or GPU. One of them is to use override modes. The second involves changing the CPU clock speed.
One of the workarounds provided in the Intel GPA System Analyzer is Disable Draw Calls mode (disable drawing functions). It helps to understand if the processor or video core is the bottleneck of the system. After testing in this mode, it is necessary to compare the results obtained with those obtained in the standard state of the system. The following table will help interpret these results.
How to analyze games using the Disable Draw Calls workaround
Change performance in Disable Draw Call mode | Interpretation |
FPS changes slightly | The speed of the game is tied to the performance of the CPU. Use the Intel GPA Platform Analyzer or Intel VTune Amplifier tools to find out exactly which features are loading the system the most. |
FPS improves markedly | The speed of the game is tied to the performance of the GPU. Use the Intel GPA Frame Analyzer tool to find out which particular output requests take the most time. |
Intel GPA System Analyzer allows you to explore application performance with various CPU settings. This is very useful in finding bottlenecks. In order to determine whether the game’s performance is tied to the CPU, you need to do the following:
- Make sure that the frame rate of your application is not tied to vertical synchronization (Vertical Sync, Vsync). To do this, take a look at the Intel GPA System Analyzer notification bar. If this mode is enabled, you will see the word Vsync highlighted in gray:
- Try setting different CPU frequencies using the sliders in the Platform Settings panel in the Intel GPA System Analyzer window. If the FPS changes when the processor frequency changes, it is very likely that the application performance is tied to the CPU

Modification of the processor frequency in the Platform Settings panel.
The following table shows the results of experiments with Looting Crown. In Disable Draw Calls mode, the frame rate does not change. This allows us to conclude that the performance of the game is tied to the CPU. However, when setting the maximum possible processor frequency (Highest CPU Frequency mode), the FPS also does not change. This, in turn, suggests that the performance of Looting Crown is tied to the video core. In order to resolve this contradiction, we need to return to the reference measurement point, to the data on the processor load and video core, which we presented above. Namely, we see that on the reference device, built on the Bay Trail platform, the video core is loaded at 91%, and the processor at 27%. As a result, the processor cannot reach its full potential as long as the video chip runs to the limit. GPU, in our case, and is the bottleneck of the system. Therefore, for starters, we will focus on optimizing the use of the video core, after which we will repeat the testing.
FPS measurement results in various modes
Mode | Fps |
Normal | 23 |
Disable Draw Calls | 23 |
Highest CPU Frequency | 23 |
4. Search for GPU bottlenecks
As already mentioned, we found out that the bottleneck of the game is in the GPU. Let's analyze the situation using the Intel GPA Frame Analyzer. Here is what the frame information for the reference measurement point looks like.

View data using Intel GPA Frame Analyzer.
4.1. Reduce the number of calls to the drawing function
We reduced the number of calls to the drawing function by combining hundreds of static grids into one and applying larger textures.
Consider the indicators before optimization and after. For measurements, we use a unit of measure called erg. Erg is a team that does some work on the graphics core during frame output. For example, ergs are functions of drawing, cleaning, and other calls to the graphic APIs.
Reference Point Indicators
Indicator | Value |
Total ergs | 1726 |
The total number of primitives | 122204 |
GPU Duration, ms | 23 |
Time required to display the frame, ms | 48 |
Estimating the cost of image output for a reference measurement point
Type of transaction | Number of Ergs | Time, ms | % |
Cleaning | 0 | 0.2 | 0.5 |
Ocean | 1 | 6 | 13.7 |
Terrain | 2 ~ 977 | 20 | 41.9 |
Grass | 19 ~ 977 | 18 | 39.0 |
Character, buildings, effects | 978 ~ 1676 | 19 | 40.6 |
User interface | 1677 ~ 1725 | 1 | 3.4 |
The total withdrawal time of the “Terrain” is 20 ms, while on the “Grass”, which this “Terrain” has overgrown with, is 18 ms. This is about 90% of the time that is required to process the "Earth". Therefore, we continue the analysis in order to understand why the conclusion of "Grass" requires such a serious investment of time.

The process of building a "terrain".

Grass texture.
Looting Crown fills most of the “earth” with small squares of “grass”. As a result, the number of calls to the drawing function in the "Terrain" column (in the table below) is 960. The time to draw one such fragment is very short. However, the output of all fragments overloads the system. A relatively simple operation consumes an unreasonably large amount of resources. Therefore, we focused on the need to reduce the number of calls to the drawing function by combining several hundred static grids into one. In addition, we decided to use a larger texture. Here's what happened after this optimization:
Comparison of the cost of image output when using small and large textures
Indicator | Value |
Small texture, ms. | 18 |
Number of Ergs | 960 |
Great texture, ms. | 6 |
Number of Ergs | 1 |

Changed "Terrain".
The conclusion of the "terrain", consisting of small textures, requires a large number of calls to the drawing function. Therefore, we reduced the number of such calls and saved 12 milliseconds on the output of "grass".
4.2. Graphics Resource Optimization
Here are the results obtained when applying the large-sized texture to Grass.
Performance data from the first optimization
Indicator | Value |
Total ergs | 179 |
The total number of primitives | 27537 |
The duration of the GPU, ms. | 24 |
The time required to display the frame, ms. | 27 |
Estimating the cost of image output after the first optimization
Type of transaction | Number of Ergs | Time, ms | % |
Cleaning | 0 | 2 | 10.4 |
Ocean | 18 | 6 | 23.6 |
Terrain | 1 ~ 17, 19, 23 ~ 96 | 14 | 53.4 |
Grass | 19 | 6 | 23.2 |
Character, buildings, effects | 20 ~ 22, 97 ~ 131 | 1 | 5.9 |
User interface | 132 ~ 178 | 1 | 5.7 |
After the first pass of optimization, we again checked whether the performance of the game is tied to the GPU. The same measurements were performed as before - in the disable modes of the drawing functions (Disable Draw Calls) and the highest processor frequency (Highest CPU Frequency).
FPS measurement results after the first optimization
Mode | Fps |
Normal | 40 |
Disable Draw Calls | 60 |
Highest CPU Frequency | 40 |
The table shows that when you turn off the drawing functions, FPS grows, and when you set the maximum processor frequency, it does not change. This suggests that the performance of Looting Crown is still tied to the graphics core. In addition, we again checked the load that the game creates on the CPU and GPU.

CPU and graphics core load after the first optimization. Data retrieved using Intel GPA System Analyzer.
Here you can see that on a reference system built on the Bay Trail platform, the processor load is about 13%, but the graphics core is loaded at 99%. Thus, optimizations regarding the use of the central processor will not increase the performance of the game until we deal with overloading the video core.
Looting Crown was originally created for the PC. Its graphic resources are not suitable for mobile devices that have less powerful CPUs and GPUs than desktop devices. Therefore, we subjected the graphic resources to a series of optimizations.
1. Minimization of calls of the function of drawing (Draw Calls)
From 10 to 2 the quantity of materials of objects is reduced.
Reduced number of particle layers.
2. Reduced the number of polygons
Using the Simplygon tool reduced the level of detail (LOD, Level Of Details) of the characters.

Character at different stages of reducing the level of detail.
Reduced the number of polygons used to display the "terrain". To begin with, we reduced the detail of mountains that are far away and do not require high detail. Then they reduced the number of polygons for a flat "ground", for the simulation of which a pair of triangles is enough.
3. Used optimized lighting maps.
Disabled dynamic lighting for "Time of day."
The size of the lighting map for each of the grids, in particular for the background, is minimized.
4. The change in the state of rendering elements is minimized.
The number of materials has been reduced, which has reduced the number of state changes of rendering elements and texture changes.
5. Separated animated parts of static grids.
The Havoc engine that was used to create the game does not support updating only the area of the object that is being animated. If the object is moving only a small part, then it is still updated as a whole. In order to cope with this, we separated the moving parts (smoke, highlighted in red in the following figure) from the fixed part of the objects. The result is two separate models of game objects.

Separating animated smoke from a static grid.
4.3. Effective application of Z-clipping
When a 3D video card displays objects, a three-dimensional scene (points at which have coordinates x, y, z) is projected onto a plane (transition to the x, y coordinate model). In order to save information about the depth of each pixel (that is, data of the z coordinate), the so-called Z-buffer or depth buffer is used. If two three-dimensional objects need to be output at the same point on the plane, the GPU compares the data by the depths of these objects. The video core will overwrite the current pixel if the new object is closer to the observer than the previous one. Thus, the use of Z-buffer allows you to correctly reproduce the usual depth of space. The technique of Z-cutting (Z-culling) is that first draw those objects that are closer, they overlap the distant objects that do not have to be drawn only to later "paint over".
In Looting Crown, part of the terrain is covered by the ocean, part - by grass. Most of the ocean is behind grass, that is, these areas are hidden. However, the ocean is removed earlier than grass, which does not allow the effective use of Z-clipping. Shown below are the GPU runtime analysis data for ocean and grass. To draw the ocean, 18 ergs are required, to the grass - 19. If the grass was displayed earlier than the ocean, then since it is closer to the viewer, most of the pixels in the ocean simply would not need to be displayed. This would reduce the time the GPU takes to draw objects. After the second optimization, as shown in the corresponding figure, the GPU operating time during ocean output decreased from 6 ms. up to 0.3 ms.

The cost of drawing the ocean after the first optimization.

The cost of drawing grass after the first optimization.

The cost of drawing the ocean after the second optimization.
results
By taking the steps above, mTricks has optimized all graphic resources for mobile devices. At the same time, it was possible to maintain a high level of quality. The number of ergs was reduced from 1726 to 124. The number of primitives decreased from 122204 to 9525.

Changes in graphic resources.
The optimization results are shown below. So, after all the improvements, the FPS indicator changed from 23 to 60 frames per second on a device built on the Bay Trail platform.

FPS increase during optimization.
Change FPS, GPU and CPU load
Indicator | Reference point | First optimization | Second optimization |
Fps | 23 | 45 | 60 |
GPU loading,% | 91 | 99 | 71 |
CPU load,% | 27 | thirteen | 22 |
After the first optimization, analysis of the game on the Bay Trail device showed that its performance is still tied to the GPU. The purpose of the second optimization was to reduce the load on the graphics core due to the optimization of graphic resources and more rational use of the Z-buffer. As a result, we managed to achieve 60 frames per second. Since Android uses Vsync, 60 FPS is the maximum achievable on this platform.
Getting Started with Intel Graphics Frame Analyzer for OpenGL
If you want to analyze your own Android application using Intel tools, you will need the appropriate tools. In particular, a full-fledged working environment for creating and analyzing Android applications can be organized using Intel INDE and Intel GPA . Intel GPA can be installed both independently and together with the installation of INDE.
When loading the Intel GPA, you need to select a file that matches your work environment and the platform on which you plan to study graphics. In our case, provided that Windows is used as the OS of the computer on which the development is performed, the Windows 7 / 8.1 (x64) Graphics Frame Analyzer for OpenGL package is selected .
Let's see how to get started analyzing applications using Intel Graphics Frame Analyzer for OpenGL. Before the analysis, you need to properly prepare the application. Namely, the following permission must be entered into the AndroidManifest.xml file:
Also in the section
Without such preparation, even if you install a debug version of the application on the device, Graphics Frame Analyzer will not be able to work with it. In addition, it is important that the device is detected by ADB.
After the application is installed on an Android device, you can run Graphics Frame Analyzer for OpenGL. If in the title bar of the application window you see a mobile device connected to the computer, then up to this point everything was done correctly. Now you can use the Add button, which is located in the working area of the window.
The Graphics Frame Analyzer will analyze the device and, in the Analyzable applications section, will display a list of applications that can be analyzed.

List of applications received from the device.
In our case, we use an Asus Fonepad 8 tablet with an Intel Atom Z3530 CPU. The device has Android 5.0 installed.
The list contains an example application for working with OpenGL, which is prepared as described above. The original application can be found here . In addition, you can download a project in which all the necessary settings are made.
After double-clicking on the application icon, the Capture button will appear on the left side of the program window. After clicking on this button, application data will be captured and a thumbnail corresponding to the application screen will appear on the right side of the window. Clicking on this thumbnail opens a page with a list of the results of previous tests. On this page you need to click on the desired icon - and we get into the window where the results of the analysis are presented.

Analysis Results Window.
conclusions
When starting to optimize the game, first identify bottlenecks. Intel GPA is able to help in this matter. It gives the developer powerful analytic tools. If game performance is tied to a CPU, valuable information can be obtained using Intel VTune Amplifier. If the performance of the game "rests" on the capabilities of the GPU, you can look for bottlenecks using the Intel GPA.
In order to fix the problems of the game, tied to the GPU, it’s worth looking for effective ways to reduce the calls to the drawing function, the number of polygons, and changing the state of the rendering elements. In addition, you can check the size of terrain textures, animation of objects, lightmaps, find out whether work with the Z-buffer is organized correctly.