binaryzebra December 9, 2014 at 04:21

Creation of World of Tanks Blitz based on DAVA's own engine

Prologue

This story began more than three years ago. Our small company DAVA became part of Wargaming, and we began to think about what projects to do next. To remind you what the mobile was like three years ago, I’ll say that there were neither Clash Of Clans, Puzzle & Dragons, nor many of the very famous projects today. Mid-core was just beginning. The market was many times smaller than today.

Initially, it seemed to everyone that it would be a very good idea to make several small games that would attract new users to large “tanks”. After a series of experiments, it turned out that this did not work. Despite the excellent conversions in mobile applications, the transition from mobile to PC turned out to be an abyss for users.

Then in development we had several games. One of them was called Sniper. The main gameplay idea was shooting in a sniper mode from a tank standing on the defensive, at other tanks that were controlled by AI and which could attack in response.

At some point, it seemed to us that a standing tank is very boring, and in a week we made a prototype multiplayer, where tanks could already ride and attack each other.

Since this all started!

When we started developing the “Sniper”, we considered the technologies that were then available for mobile platforms. At that time, Unity was still at a fairly early stage of its development: in fact, the technologies we needed did not yet exist.

The main thing that we lacked was the rendering of the landscape with dynamic detail, which is vital for creating a game with open spaces. There were several third-party libraries for Unity, but their quality was poor.

We also understood that in C # we won’t be able to squeeze the maximum out of the devices for which we are developing, and will always be limited.
Unreal Engine 3 was also not suitable for a number of similar reasons.

As a result, we decided to refine our engine!

It was already used at that time in our previous casual projects. The engine had a fairly well-written low level of work with platforms and supported iOS, PC, Mac, plus work on Android was started. A lot of functionality has been written for creating 2D games. That is, there was a good UI and a lot of things for working with 2D. It was the first steps in the 3D part, as one of our games was completely three-dimensional.

What we had in the 3D part of the engine:

The simplest graph of the scene.
Ability to draw static meshes.
Ability to draw animated meshes with skeletal animation.
Export objects and animations from Collada format.

In general, if we talk about the functionality of a serious modern engine, there was very little in it.

Beginning of work

It all started with proof of the ability to draw the landscape on mobile devices: then it was iPhone 4 and iPad 1.

After several days of work, we got a fully functional dynamic landscape that worked pretty well, it required somewhere 8MB of memory and gave 60fps on these devices. After that, we started the full development of the game.

About six months passed, and a small mini-project turned into what Blitz is now. There are completely new requirements: MMO, AAA-quality and other requirements that the engine in its original form at that time could no longer provide. But the work was in full swing. The game worked and worked well. However, the performance was average, there were few objects on the cards, and, in fact, there were many other limitations.

At this stage, we began to understand that the foundation that we laid in the engine could not stand the press of a real project.

How it worked at that time

All scene rendering was based on the simple Scene Graph concept.

The main concepts were two classes:

Scene - the container of the scene, inside which all the actions took place
over the stage.
SceneNode - the base class of the scene node, from which all classes that were in the scene inherit:
MeshInstanceNode - a class for rendering meshes.
LodNode - class for switching lods.
SwitchNode - a class for switching switch objects.
about 15 more classes of SceneNode heirs.

The SceneNode class allowed overriding a set of virtual methods to implement some kind of custom functionality:
The main functions that could be redefined were:

Update - a function that was called for each node in order to make Update scenes.
Draw is a function that was called for each node in order to draw this node.

The main problems we encountered.

First, performance:

When the number of nodes in the level reached 5000, it turned out that just going through all the empty Update functions takes about 3ms.
A similar time was spent on empty nodes that did not require Draw.
A huge number of cache misses, since work has always been carried out with heterogeneous data.
Inability to parallelize work on several cores.

Secondly, unpredictability:

Changing the code in the base classes affected the whole system, that is, every change in SceneNode :: Update could break anything and anywhere. Dependencies became more and more complicated, and each change inside the engine almost guaranteed required testing of all related functionality.
It was impossible to make a local change, for example, in transformations, so as not to hurt the rest of the scene. Very often the slightest changes in LodNode (node for switching lods) broke something in the game.

The first steps to improve the situation

To begin with, we decided to treat performance problems and do it quickly.

Actually, we did this by introducing an additional flag NEED_UPDATE in each node. It determined whether such a node should be called Update. This really improved productivity, but created a whole bunch of problems. In fact, the code for the Update function looked like this:

void SceneNode::Update(float timeElapsed)
{
     if (!(flags & NEED_UPDATE))return; 
     // the rest of the update function
     // process children
}

This returned us some of the performance, but many logical problems started where they were not expected.

LodNode, and SwitchNode - the nodes responsible, respectively, for switching lods (by distance) and switching objects (for example, destroyed and non-destroyed) - began to break regularly.

From time to time, the one who tried to fix the breakdowns did the following: turned off NEED_UPDATE in the base class (after all, it was a simple solution), and completely unnoticed, FPS fell again.

When the code checking the NEED_UPDATE flag was commented out three times, we decided on a radical change. We understood that we could not do everything at once, so we decided to act in stages.

The very first step was to lay the architecture, which will allow in the future to solve all the problems that arise with us.

Goals

Minimization of dependence between independent subsystems.
Changes in transformations should not break the system of lods, and vice versa
Ability to put code on multi-core.
That there were no Update functions or similar in which heterogeneous independent code was executed. Easy extensibility of the system with new functionality without fully re-testing the old one. Changes in some subsystems do not affect others. Maximum independence of subsystems.
The ability to arrange data linearly in memory for maximum performance.

In the first stage, the main goal was to redesign the architecture so that all these goals could be fulfilled.

Combination of component and data-driven approach

The solution to this problem was a component approach combined with a data-driven approach. Further in the text I will use the data-driven approach, since I did not find a successful translation.

In general, many people have a different understanding of the component approach. Same thing with data-driven.

In my understanding, a component approach is when some necessary functionality is built on the basis of independent components. The simplest example is electronics. There are chips, each chip has inputs and outputs. If the chips fit together, they can be connected. The whole electronics industry is built on the basis of this approach. There are thousands of different components: combining them with each other, you can get completely different things.

The main advantages of this approach are that each component is isolated, and with greater independence. I do not take into account the fact that the component can send incorrect data, and the board will burn. The advantages of this approach are obvious. Today you can take a huge number of ready-made chips and assemble a new device.

What is data-driven . In my understanding, this is an approach to software design, when data, rather than logic, is taken as the basis of the program flow.

In our example, imagine the following class hierarchy:

class SceneNode
{
     // Данные отвечающие за иерархические трансформации
     Matrix4 localTransform;
     Matrix4 worldTransform; 
     virtual void Update();
     virtual void Draw();     
     Vector children;
}
class LodNode
{
     // Данные cпецифичные для вычисления лодов
     LodDistance lods[4];
     virtual void Update(); // переопределен метод Update, для того чтобы в момент переключения лодов, включать или выключать какие-то из его чайлдов
     virtual void Draw(); // рисуем только текущий активный лод
};
class MeshNode
{
     RenderMesh * mesh; 
     virtual void Draw(); // рисуем меш
};

The workaround code for this hierarchy looks hierarchically:

Main Loop:
rootNode->Update();
rootNode->Draw();

In this hierarchy of C ++ inheritance, we have three different independent data streams:

Transformations
Loda
Meshes

Nodes only unite them in a hierarchy, but it is important to understand that it is better to process each data stream sequentially. The practical need for hierarchical processing is needed only for transformations.

Let's imagine what it should look like in a data-driven approach. I will write in pseudo-code so that the idea is clear:

// Transform Data Loop:
for (each localTransform in localTransformArray)
{
     worldTransform = parent->worldTransform * localTransform;
}
// Lod Data Loop:
for (each lod in lodArray)
{
     // calculate lod distance and find nearest lod
     nearestRenderObject = GetNearestRenderObject(lod);
     renderObjectIndex = GetLodObjectRenderObjectIndex(lod);
     renderObjectArray[renderObjectIndex] = renderObject;
}
// Mesh Render Data Loop:
for (each renderObject in renderObjectArray)
{
     RenderMesh(renderObject);
}

In fact, we launched the program’s work cycles, doing so in such a way that everything was based on data.

Data in a data-driven approach is a key element of the program. Logic is just data processing mechanisms.

New architecture

At some point, it became clear that we needed to go towards the Entity-based approach to organizing the scene, where Entity was an entity consisting of many independent components. I wanted the components to be completely arbitrary and easy to combine with each other.

While reading information on this topic, I came across a T-Machine blog .

He gave me many answers to my questions, but the main answer was the following:

• Entity does not contain any logic, it's just an ID (or pointer).
• Entity only knows the component IDs that belong to it (or the pointer).
• A component is just data, that is. the component does not contain any logic.
• A system is a code that can process a specific data set and output another data set at the output.

When I realized this, in the process of further studying various information I came across the Artemis Framework and saw a good implementation of this approach.
Sources here, if the previous link does not work: Artemis Original Java Source Code

If you are developing in Java, I highly recommend looking at it. Very simple and conceptually correct Framework. Today he is sported into a bunch of languages.

What Artemis is today is called ECS (Entity Component System). There are a lot of options for organizing the scene based on Entity, components and data-driven, however, as a result, we came to the ECS architecture. It is difficult to say how common this term is, but ECS means that there are the following entities: Entity, Component, System.

The most important difference from other approaches is: Mandatory lack of logic of behavior in components, and separation of code in systems.

This point is very important in the “Orthodox” component approach. If you violate the first principle, there will be a lot of temptations. One of the first is to do component inheritance.

Despite its flexibility, it usually ends with pasta.

Initially, it seems that with this approach, it will be possible to make many components that behave in a similar way, but in slightly different ways. Common component interfaces. In general, you can again fall into the trap of inheritance. Yes, it will be a little better than classical inheritance, but try not to fall into this trap.

ECS is a cleaner approach and solves more problems.

To see an example of how this works in Artemis, you can take a look here .

I’ll show by example how this works for us.

The main container class is Entity. This is a class that contains an array of components.

The second class is Component. In our case, this is just data.

Here is a list of the components used in our engine today:

    enum eType
    {
        TRANSFORM_COMPONENT = 0,
        RENDER_COMPONENT,
        LOD_COMPONENT,
        DEBUG_RENDER_COMPONENT,
        SWITCH_COMPONENT,
        CAMERA_COMPONENT,
        LIGHT_COMPONENT,
        PARTICLE_EFFECT_COMPONENT,
        BULLET_COMPONENT,
        UPDATABLE_COMPONENT,
        ANIMATION_COMPONENT,
        COLLISION_COMPONENT,    // multiple instances
        PHYSICS_COMPONENT,
        ACTION_COMPONENT,       // actions, something simplier than scripts that can influence logic, can be multiple
        SCRIPT_COMPONENT,       // multiple instances, not now, it will happen much later.
        USER_COMPONENT,
        SOUND_COMPONENT,
        CUSTOM_PROPERTIES_COMPONENT,
        STATIC_OCCLUSION_COMPONENT,
        STATIC_OCCLUSION_DATA_COMPONENT, 
        QUALITY_SETTINGS_COMPONENT,   // type as fastname for detecting type of model
        SPEEDTREE_COMPONENT,
        WIND_COMPONENT,
        WAVE_COMPONENT,
        SKELETON_COMPONENT,
        //debug components - note that everything below won't be serialized
        DEBUG_COMPONENTS,
        STATIC_OCCLUSION_DEBUG_DRAW_COMPONENT,
        COMPONENT_COUNT
    };

The third class is SceneSystem:

    /**
        \brief  This function is called when any entity registered to scene.
                It sorts out is entity has all necessary components and we need to call AddEntity.
        \param[in] entity entity we've just added
     */
    virtual void RegisterEntity(Entity * entity);
    /**
        \brief  This function is called when any entity unregistered from scene.
                It sorts out is entity has all necessary components and we need to call RemoveEntity.
        \param[in] entity entity we've just removed
     */
    virtual void UnregisterEntity(Entity * entity);

The RegisterEntity, UnregisterEntity functions are called for all systems in the scene when we add or remove Entity from the scene.

    /**
        \brief  This function is called when any component is registered to scene.
                It sorts out is entity has all necessary components and we need to call AddEntity.
        \param[in] entity entity we added component to.
        \param[in] component component we've just added to entity.
     */
    virtual void RegisterComponent(Entity * entity, Component * component);
    /**
        \brief  This function is called when any component is unregistered from scene.
                It sorts out is entity has all necessary components and we need to call RemoveEntity.
        \param[in] entity entity we removed component from.
        \param[in] component component we've just removed from entity.
     */
    virtual void UnregisterComponent(Entity * entity, Component * component);

The RegisterComponent, UnregisterComponent functions are called for all systems in the scene, when we add or remove Component in the Entity in the scene.
Also for convenience, there are two more functions:

    /**
        \brief This function is called only when entity has all required components.
        \param[in] entity entity we want to add.
     */
    virtual void AddEntity(Entity * entity);
    /**
        \brief This function is called only when entity had all required components, and don't have them anymore.
        \param[in] entity entity we want to remove.
     */
    virtual void RemoveEntity(Entity * entity);

These functions are called when the ordered set of components has already been created using the SetRequiredComponents function.

For example, we can order receiving only those Entities that have ACTION_COMPONENT and SOUND_COMPONENT. I pass this to SetRequiredComponents and - voila.

To understand how this works, I will describe with examples what systems we have:

TransformSystem - a system that is responsible for the hierarchy of transformations.
SwitchSystem - a system that is responsible for switching objects that can be in several states, such as destroyed and non-destroyed.
LodSystem - a system that is responsible for switching lods by distance.
ParticleEffectSystem - a system that updates particle effects.
RenderUpdateSystem - a system that updates render objects from the scene graph.
LightUpdateSystem - a system that updates light sources from a scene graph.
ActionUpdateSystem - a system that updates actions.
SoundUpdateSystem - a system that updates sounds, their position and orientation.
UpdateSystem - a system that invokes custom user updates.
StaticOcclusionSystem - The static application system is closed.
StaticOcclusionBuildSystem - a static occlusion building system.
SpeedTreeUpdateSystem - Speed Tree tree update system.
WindSystem - wind calculation system.
WaveSystem - a system for calculating vibrations from squeaks.
FolliageSystem - a system for calculating vegetation over a landscape.

The most important result that we have achieved is a high decomposition of the code responsible for heterogeneous things. Now in the TransformSystem :: Process function all the code related to transformations is clearly localized. He is very simple. It is easy to decompose into several cores. And most importantly, it is difficult to break something in another system by making a logical change in the system of transformations.

In almost any system, the code is as follows:

for (определенного набора объектов) 
{
  // получить необходимые компоненты 
  // выполнить действия над этими объектам
  // записать данные в компоненты
}

Systems can be classified by how they process objects:

Processing of all objects that are in the system is required:
- Physics
- Collisions
Only marked objects need to be processed:
- Transformation system
- Actions system
- Sound processing system
- Particle processing system
Work with your specially optimized data structure:
- Static Occlusion System

With this approach, in addition to the fact that it is very easy to process objects into several cores, it is very easy to do what is rather difficult to do in the usual polymorphism paradigm. For example, you can easily take and process not all lod switching per frame. If there are VERY many lodging objects in a large open world, you can make each frame process for example one third of the objects. However, this does not affect other systems.

Total

We greatly increased FPS, as things became more independent with the component approach and we were able to individually untie and optimize them.
The architecture has become simpler and more understandable.
It became easy to expand the engine, almost without breaking neighboring systems.
There are fewer bugs from the series “having done something with lods, broke the switches”, and vice versa
There was an opportunity to parallelize this all to several cores.
At the moment, we are already working to ensure that all systems run on all available cores.

Our engine code is in Open Source. The engine, in the form in which it is used in World of Tanks Blitz, is fully available on the network on github .

Accordingly, if you wish, you can go in and look at our implementation in detail.

Consider the fact that everything was written in a real project, and, of course, this is not an academic implementation.

Future plans:

More efficient management of these components, that is, to decompose these components linearly in memory, to minimize cache misses
Transition to multitasking in all systems.

All useful links from the text at last:

Tags: