The story of one optimization: transfer and processing of battle results

Today I will tell you about a small part of a large project - World of Tanks . Many of you probably know World of Tanks on the part of the user, but I suggest looking at it from the point of view of the developer. This article will focus on the evolution of one of the technical solutions of the project, namely the transfer and processing of battle results.
Post battle


Under the hood


To understand the essence of the problem under consideration, I will first describe how the "Tanks" are arranged. The project is a distributed client-server application, which on the server side is represented by several nodes of different types. All physics and in-game logic are calculated on the server, and the client is responsible for interacting with the user - processing user input and displaying the results. A simplified cluster diagram looks like this:
Cluster diagram

BaseApp nodes are responsible for interacting with the client. They receive information from the user over the network and transfer it to the inside of the cluster, and also send information to the user about events that occurred within the cluster. All physical interactions taking place in the arena are calculated on the CellApp nodes. Information is also collected on the events that occurred with the tank during the battle: the number of shots, hits, penetrations, etc. One battle can be served by several CellApps, on each of which several users from different battles can be counted. At the end of the battle, statistics packs for all tanks are sent from CellApps to BaseApp. He aggregates the packages and processes the results of the battle: calculates fines and rewards, checks the completion of quests and issues medals, that is, forms other data packets that he sends to users. It is important to note that CellApps and BaseApps are isolated processes. Some of them can be located on other machines, but within the same cluster. So the data transfer between them occurs through network interfaces, and between the client and the cluster through the Internet.
For the transmission of data packets, a guaranteed delivery protocol implemented over UDP is used. Under the hood, for all I \ O and guaranteed delivery over UDP, the Mercury library, which is part of the BigWorld Technology platform, is responsible. In fact, we can affect the data only before sending - to prepare it in such a way as to optimize the cost of sending / receiving.

For obvious reasons, we want to reduce the amount of data transmitted between nodes to a minimum: this will reduce traffic spikes, the delay in reception and transmission and significantly reduce the amount of data transmitted to the client with information about the results of the battle. One tick equal to 100 ms is allocated for data preprocessing, during which other events can also be processed. Therefore, pre-processing should take as little time as possible. Ideally, it should not be time-consuming at all.

Start


Let's take a look at the data preparation process.
Most of the in-game logic of World of Tanks is written in Python. The data sent between CellApp and BaseApp, as well as the data sent to the user at the end of the battle, are dict with a variety of values ​​from simple (like integer / fractional number or True / False) to compound ones - dictionaries and sequences. Neither code-object, nor classes, nor instances of custom classes are ever passed from one node to another. This is due to safety and performance requirements.

In order to make it more convenient to consider further material, we give an example of data with which we will work further:
data = {
	"someIntegerValue" : 1,
	"someBoolValue" : True,
	"someListOfIntsValue" : [1, 2, 3],
	"someFloatValue" : 0.5,
	"someDictionaryValue" : {
		"innerVal" : 1,
		"innerVal2" : 2,
		"listOfVals" : [1, 2, 3, 4],
	},
	"2dArray" : [
		[1,   2,  3,  4,  5,  6],
		[7,   8,  9, 10, 11, 12],
		[13, 14, 15, 16, 17, 18],
		[19, 20, 21, 22, 23, 24]
	]
}

For direct data transfer between nodes, the dict instance must be converted to a binary data packet of minimum size. When making a remote call, the BigWorld engine provides the ability for a programmer to transparently convert Python objects to binary data and vice versa using the cPikcle module.
While the amount of data transferred was small, they were simply converted to binary format before sending using the cPickle module (let's call this protocol the exchange version 0). It was in this form that the first closed beta version of 0.1 tanks was released back in 2010.

The advantages of this method include simplicity and sufficient efficiency both in terms of speed and compactness of the final binary representation.
>>> p = cPickle.dumps(data, -1)
>>> len(p)
277


The disadvantages follow from the advantages, in particular, the simplicity of this method: string keys from the dictionary are also transferred between the cluster nodes and from the cluster to the user. In some cases, these string keys can occupy a significant part of the transmitted binary package.

Optimize


As time went on, the amount of data and simultaneous online increased significantly, and, consequently, the number of simultaneously terminated fights and the amount of information transmitted at the end of the battle increased. To reduce traffic, we threw out the string keys from the data being transferred and replaced it with indexes in the list. Such an operation did not lead to data loss: knowing the order of the keys, restoring the original dict is easy.

To perform key removal and recovery operations, a template was needed in which all possible keys and corresponding functions would be stored:
NAMES = (
	"2dArray",
	"someListOfIntsValue",
	"someDictionaryValue",
	"someFloatValue",
	"someIntegerValue",
	"someBoolValue",
)
INDICES = {x[1] : x[0] for x in enumerate(NAMES)}
def dictToList(indices, d):
	l = [None, ] * len(indices)
	for name, index in indices.iteritems():
	l[index] = d[name]
	return l
def listToDict(names, l):
	d = {}
	for x in enumerate(names):
	d[x[1]] = l[x[0]]
	return d
>>> dictToList(INDICES, data)
[[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]], [1, 2, 3], {'listOfVals': [1, 2, 3, 4], 'innerVal2': 2, 'innerVal': 1}, 0.5, 1, True]
>>> 
>>> len(cPickle.dumps(dictToList(INDICES, data), -1)
165

As you can see from the example, the binary representation has become more compact compared to version 0. But for the compactness, we paid the time spent on data preprocessing and adding new code that needs to be supported.
This solution was released in version 0.9.5, at the very beginning of 2015. The protocol with the conversion of dict to list, followed by pickle.dumps (data, -1) is called version 1.

We will optimize further


With the release of the new “Superiority” mode , the amount of data has grown significantly, since they are collected for each tank separately, and in this mode the user can go into battle and generate data already on three machines. Therefore, the need to tamp the data even more tightly arose sharply.
In order to transfer even less redundant information, we applied the following techniques:

1. For each embedded dictionary, we applied the same approach as for the main dictionary - threw out the keys and converted the dict to list. Thus, the “template”, according to which data is converted from dict to list and vice versa, has become recursive.

2. Having carefully looked at the data before sending it, we noticed that in some cases the sequence is enclosed in a container of the set or frozenset type. The binary representation of these containers in the cPickle version 2 protocol takes up much more space:
>>> l = list(xrange(3))
>>> cPickle.dumps(set(l), -1)
'\x80\x02c__builtin__\nset\nq\x01]q\x02(K\x00K\x01K\x02e\x85Rq\x03.'
>>> 
>>> cPickle.dumps(frozenset(l), -1)
'\x80\x02c__builtin__\nfrozenset\nq\x01]q\x02(K\x00K\x01K\x02e\x85Rq\x03.'
>>> 
>>> cPickle.dumps(l, -1)
'\x80\x02]q\x01(K\x00K\x01K\x02e.'

We saved a few more bytes by converting set and frozenset to list before sending. Since the receiver side is usually not interested in a particular type of sequence, and only the data is important, such a replacement did not lead to errors.

3. Quite often, not all keys in a dictionary have values. Some of them may be absent, while others may not differ from the default values ​​that are known in advance on the transmitting and receiving sides. You also need to remember that the "default" values ​​for data of different types have different binary representations. Rarely enough, but still there are default values ​​that are slightly more complex than just an empty value of a certain type. In our case, these are several counters combined in one field, presented in the form of sequences of zeros. In these cases, default values ​​can take up a lot of space in binary data sent between nodes. In order to achieve even greater savings, before sending, we replace the default values ​​with None.
>>> len(cPickle.dumps(None, -1))
4
>>> len(cPickle.dumps((), -1))
4
>>> len(cPickle.dumps([], -1))
4
>>> len(cPickle.dumps({}, -1))
4
>>> len(cPickle.dumps(False, -1))
4
>>> len(cPickle.dumps(0, -1))
5
>>> len(cPickle.dumps(0.0, -1))
12
>>> len(cPickle.dumps([0, 0, 0], -1))
14

Considering the examples, it is worth considering that cPickle adds a header and a terminator to the binary package, the total amount of which is 3 bytes, and the actual amount of serialized data is equal to (X - 3), where X is the value from the example.
Moreover, replacing default values ​​is also beneficial for zlib compression of binary data. In the binary representation, the list elements go one after another without any separators. Several consecutive defaults replaced with None will be presented as a sequence of identical bytes, which can be well archived.

4. Data is archived by zlib with a compression level of 1, because this level allows you to achieve the optimal ratio of the degree of archiving to the time of work.

If you bring steps 1-3 together, you get something like this:
class DictPacker(object):
	def __init__(self, *metaData):
		self._metaData = tuple(metaData)
	# Packs input dataDict into a list.
	def pack(self, dataDict):
		metaData = self._metaData
		l = [None] * len(metaData)
		for index, metaEntry in enumerate(metaData):
		try:
				name, transportType, default, packer = metaEntry
				default = copy.deepcopy(default) # prevents modification of default.
				v = dataDict.get(name, default)
				if v is None:
					pass
				elif v == default:
					v = None
				elif packer is not None:
					v = packer.pack(v)
				elif transportType is not None and type(v) is not transportType:
					v = transportType(v)
					if v == default:
							v = None
				l[index] = v
		except Exception as e:
				LOG_DEBUG_DEV("error while packing:", index, metaEntry, str(e))
		return l
	# Unpacks input dataList into a dict.
	def unpack(self, dataList):
		ret = {}
		for index, meta in enumerate(self._metaData):
			val = dataList[index]
			name, _, default, packer = meta
			default = copy.deepcopy(default) # prevents modification of default.
			if val is None:
				val = default
			elif packer is not None:
				val = packer.unpack(val)
			ret[name] = val
		return ret
PACKER = DictPacker(
	("2dArray", list, 0, None),
	("someListOfIntsValue", list, [], None),
	("someDictionaryValue", dict, {},
	 DictPacker(
		("innerVal", int, 0, None),
		("innerVal2", int, 0, None),
		("listOfVals", list, [], None),
	)
	),
	("someFloatValue", float, 0.0, None),
	("someIntegerValue", int, 0, None),
	("someBoolValue", bool, False, None),
)
>>> PACKER.pack(data)
[[[1, 2, 3, 4, 5, 6], [7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24]], [1, 2, 3], [1, 2, [1, 2, 3, 4]], 0.5, 1, True]
>>> len(cPickle.dumps(PACKER.pack(data), -1))
126

As a result, all these tricks were applied in the protocol version 2, which was released in version 0.9.8, in May 2015.
Yes, we have further increased the time spent on preliminary preparation, but the volume of the binary package has also decreased significantly.

Comparison of Results


In order to see what the use of the above techniques on real data led to, we present a graph of the size of the packet data on one tank transmitted from CellApp to BaseApp at the end of the battle, in different versions, depending on the version.
Tank size packet sent from BaseApp to CellApp

Recall that in the "Superiority" mode version 0.9.8, a player can go into battle in three tanks and, accordingly, the total amount of data will increase three times.

And the time taken to process the same data before sending.
Time for preprocessing the packet by sending from BaseApp to CellApp

Where 0.9.8u is processing without compression by zlib (uncomressed), and 0.9.8c is using compression (compressed). Time is indicated in seconds per 10,000 iterations.

I note that the data was collected for version 0.9.8 and then approximated for 0.9.5 and 0.1, taking into account the keys used. Moreover, for each user and tank, the data will vary significantly, since the amount of data directly depends on the player’s behavior (how many opponents were detected, damaged, etc.). So the graphs should be treated rather as an illustration of the trend.

Conclusion


In our case, it was critical to reduce the amount of data transferred between nodes. It was also desirable to reduce the pre-treatment time. Therefore, the protocol of the second version has become our optimal solution. The next logical step is to put serialization functionality in a separate binary module, which will do all the manipulations on its own and will not store information describing data in a binary stream, like pickle. This will shrink the amount of data even more, and possibly reduce the processing time. But while we are working with the solution described above.

Also popular now: