The story of what not to do during development

Prologue: To begin with, I will talk about the project, so that we have an idea of ​​how we worked on the project and to recreate the pain we felt.

I, as a developer, joined the project in 2015-2016, I do not remember exactly, but he worked 2-3 years earlier. The project was very popular in its field, namely game servers. How strange it did not sound, but projects on game servers are being carried out to this day, recently I saw vacancies and worked a bit in one team. Since the game servers are built on the already created game, therefore, the script language that is built into the game engine is used for development.

We are developing a project from Garry's Mod (Gmod) almost from scratch, it is important to note that at the time of this writing, Harry is already creating a new S & Box project on the Unreal Engine. We still sit on Source.
Which is generally not suitable for our server theme.
image

“What is your story scary?” - you ask.

We have a strong theme of the game server, namely “Stalker” and even with elements of role-playing games (RP), the question immediately arises - “And how to implement it all on one server?”.

Given that the Source engine is old (the 2013 version is used in Gmod also 32 bit), you can’t make large maps, small restrictions on the number of Entity, Mesh and many other things.
Who worked on the engine, will understand.
It turns out, the task is generally impossible, to make a pure multiplayer stalker with quests, RPG-elements from the original and preferably a small story.

First of all, the initial writing was difficult (many actions from the category: throwing out the subject, picking up the subject were written from scratch), hoping that it would be easier to continue, but the requirements grew. The mechanics of the game was ready, it remained to make the intellect, agreyd and all sorts of things. In general, all transferred as they could.

image

The problems began already during the operation of the first release version, namely (lags, server delays).

It seems a powerful server could easily handle requests and keep the entire Gamemode.

Simple gamemode description
This is the name of the complex of scripts written to describe the mechanics of the server itself.
For example: we want the themes of the now popular “Royal Battles”, which means that the name should correspond to the mechanics of the game too. “Spawning players on the plane, you can pick up things, players can communicate, you can’t wear more than 1 helmet, etc.” - all this is described by the game mechanics on the server.

Lags were both on the server side due to the large number of players, since one player eats up a lot of RAM about 80-120 MB (not counting more items in the inventory, skills, etc.), and on the client side there was a strong decrease fps

The power of the CPU was not enough for processing physics, it was necessary to use objects with physical properties less.

So even in addition were our samopisny scripts which in general were not optimized in any way.

image

First of all, we of course read the article on optimization in Lua. Even came to suicidethe fact that they wanted to write a DLL in C ++, but the problem arose in downloading the DLL from the server by the clients. Using C ++ for a DLL, you can write a program that quietly intercepts the data, the Gmod developers added an extension to the exceptions for clients to download (security, although in fact it never was). Although it would be convenient and Gmod would become more flexible, but more dangerous.

Next, we looked at the profiler (since smart people wrote it) and there was horror in the functions, it was noted that, initially, there are very slow functions in the Gmod library.

If you tried to write in Gmod, then you know perfectly well that there is a library built-in called math.

And the slowest functions in it are of course math.Clamp and math.Round.

Having rummaged in the code of people, it was noticed that the functions were thrown in different directions, almost everywhere it is used, but incorrectly!

Let's get to practice. For example, we want to round off the coordinates of the position vector for moving the entity (for example, the player).

local x = 12.5local y = 14.9122133local z = 12.111
LocalPlayer():SetPos( Vector( Math.Round(x), Math.Round(y), Math.Round(z) )

3 complex rounding functions, but nothing serious, unless of course in a cycle and not often used, but Clamp is even harder.

The following code is often used in projects and no one wants to change anything.

self:setLocalVar("hunger", math.Clamp(current + 1, 0, 100))

For example, self points to a player object and it has a local variable we’ve invented that when reset to the server is reset to zero, math.Clamp is essentially like a loop, makes a smooth assignment, likes a smooth interface to do on Clamp.

Problems arise when it works on every player who enters the server. It is rarely the case, but if 5-15 enter the server at once (depending on the server configuration) at one point in time and this small and simple function starts working for everyone, then the server will have good CPU delays. Still worse if math.Clamp in a loop.

Optimization is actually very simple; you localize heavily loading functions. It seems primitive, but in 3 gamemode and many add-ons I saw this slow code.

If you need to get the value and use it in the future, do not get it again if it does not change. After all, a player entering the server in any case will get hunger equal to 100, so this code is many times faster.

local value = math.Clamp(current + 1, 0, 100)
self:setLocalVar("hunger", value)

All is well, they began to look further, that yes how it works. As a result, we started to optimize everything.

We noticed that the standard for cycle was slow and we decided to invent our own bike that would be faster (we didn’t forget about blackjack) and the game began.

image

SPOILER
We even managed to make the fastest loop on Lua Gmod, but on condition that there should be more than 100 elements.

Judging by the time spent on our cycle and its use in the code, we tried in vain to do this because it found application only in the spawn on the anomaly map after ejecting and cleaning them.
And so to the code. For example, you need to find all the entities with the name at the beginning of the anom, this is the name of the class we have anomalies.

Here is for the normal Lua Gmod scripter:

local anomtable = ents.FindByClass("anom_*")
for k, v inpairs(anomtable) do
v:Remove()
end

Here is for the smoker:

Immediately it is clear that such a g * code will be slower than the standard “for in pairs”, but as it turned out not.

local b, key = ents.FindByClass("anom_*"), nilrepeat
	key = next(b, key)
	b[key]:Remove()
until key != nil


For a complete analysis of these loop options, they need to be translated into a regular Lua script.
For example, anomtable will have 5 elements.
Removal is replaced by the usual addition. The main thing to see is the difference in the number of instructions between the two options for the implementation of a for loop.

Vanilla cycle:

local anomtable = { 1, 2, 3, 4, 5 }
for k, v inpairs(anomtable) do
v = v + 1end

Our great:

local b, key = { 1, 2, 3, 4, 5 }, nilrepeat
	key = next(b, key)
	b[key] = b[key] + 1until key ~= nil

Let's look at the interpreter code ( similarity to assembler, it is not recommended to look under the spoiler as a high-level programmer ).

Just in case, remove the june from the screens. I warned.

Vanilla cycle disassembler
; Name:	 for1.lua
; Defined at line: 0
; #Upvalues:       0
; #Parameters:     0
; Is_vararg:       2
; Max Stack Size:  7
  1 [-]: NEWTABLE  R0 5 0       ; R0 := {}
  2 [-]: LOADK     R1 K0        ; R1 := 1
  3 [-]: LOADK     R2 K1        ; R2 := 2
  4 [-]: LOADK     R3 K2        ; R3 := 3
  5 [-]: LOADK     R4 K3        ; R4 := 4
  6 [-]: LOADK     R5 K4        ; R5 := 5
  7 [-]: SETLIST   R0 5 1       ; R0[(1-1)*FPF+i] := R(0+i), 1 <= i <= 5
  8 [-]: GETGLOBAL R1 K5        ; R1 := pairs
  9 [-]: MOVE      R2 R0        ; R2 := R0
 10 [-]: CALL      R1 2 4       ; R1,R2,R3 := R1(R2)
 11 [-]: JMP       13           ; PC := 13
 12 [-]: ADD       R5 R5 K0     ; R5 := R5 + 1
 13 [-]: TFORLOOP  R1 2         ; R4,R5 :=  R1(R2,R3); if R4 ~= nil then begin PC = 12; R3 := R4 end
 14 [-]: JMP       12           ; PC := 12
 15 [-]: RETURN    R0 1         ; return


Disassembler cycle cycle
; Name:  for2.lua
; Defined at line: 0
; #Upvalues:       0
; #Parameters:     0
; Is_vararg:       2
; Max Stack Size:  6
  1 [-]: NEWTABLE  R0 5 0       ; R0 := {}
  2 [-]: LOADK     R1 K0        ; R1 := 1
  3 [-]: LOADK     R2 K1        ; R2 := 2
  4 [-]: LOADK     R3 K2        ; R3 := 3
  5 [-]: LOADK     R4 K3        ; R4 := 4
  6 [-]: LOADK     R5 K4        ; R5 := 5
  7 [-]: SETLIST   R0 5 1       ; R0[(1-1)*FPF+i] := R(0+i), 1 <= i <= 5
  8 [-]: LOADNIL   R1 R1        ; R1 := nil
  9 [-]: GETGLOBAL R2 K5        ; R2 := next
 10 [-]: MOVE      R3 R0        ; R3 := R0
 11 [-]: MOVE      R4 R1        ; R4 := R1
 12 [-]: CALL      R2 3 2       ; R2 := R2(R3,R4)
 13 [-]: MOVE      R1 R2        ; R1 := R2
 14 [-]: GETTABLE  R2 R0 R1     ; R2 := R0[R1]
 15 [-]: ADD       R2 R2 K0     ; R2 := R2 + 1
 16 [-]: SETTABLE  R0 R1 R2     ; R0[R1] := R2
 17 [-]: EQ        1 R1 K6      ; if R1 == nil then PC := 9
 18 [-]: JMP       9            ; PC := 9
 19 [-]: RETURN    R0 1         ; return


Inexperienced just by glancing, the normal cycle is faster because the instructions are fewer (15 vs 19).

But we must not forget that every instruction in the interpreter has processor cycles.
Judging by the disassembled code in the first cycle there is a forloop instruction written in advance for working with an array, the array is loaded into memory becomes global, we jump on the elements and add a constant.

In the second variant, the method is different, which is more based on memory, it gets the table, changes the element, sets the table, checks for nil and calls it again.
Our second cycle is fast due to the fact that there are too many conditions and actions in one instruction (R4, R5: = R1 (R2, R3); if R4 ~ = nil then begin PC = 12; R3: = R4 end) because of this she eats a lotuses eats CPU cycles for execution, the last is again more tied to memory.

The forloop instruction with a large number of elements is surrendered to our cycle in the speed of passage of all elements. It is connected that the address directly to the address is faster, less than any buns from pairs. (And we have no denial)
In general, in secret, any use of the negative in the code slows it down, it has already been tested with tests and time. Negative logic will work slower since the processor's ALU has a separate “inverter” computing unit, you need to contact the inverter to operate the unary operand (not,!) And this will take additional time.
Conclusion: Everything standard is not always better, your bikes can be useful, but again on a real project you shouldn’t invent them if you care about release speed. As a result, we have a full development from 2014 to the present day, a sort of another “waiter”. Although it seems like an ordinary game server which is set up in 1 day and is fully configured for the game in 2 days, but you must be able to contribute something new.

This long-term project still saw the second version of itself where optimization is very much in the code, but I will tell you about other optimizations in the following articles. Support criticism or comment, correct if I am mistaken.

Also popular now: