Experience in automating RTLS server stability testing under internal load

Introduction

In this article, I will talk about how the RTL-Service quality control department performs automated testing of the stability of the RealTrac server while servicing a large number of mobile location devices. For further understanding, I suggest that you familiarize yourself with useful terminology:
RealTrac RTLS server (server) - server software of the RealTrac system that interacts with the system hardware and calculates the location of devices.

RealTrac application server (application server) - server software necessary for the operation of a web application, providing a programmatic interface for accessing the basic functions of the system.

RealTrac Access Point(hereinafter referred to as AP) is a device designed to transfer data between mobile devices of the network and the system server. Access points are permanently installed on the object, their coordinates are entered on the client software card and recorded in a database on the system server. The AP can operate in gateway or relay mode. The mode is determined by the presence of a wired Ethernet connection to the network (gateway access point, ShTD) and the absence of such (repeater access point, RTD). Only the gateway communicates with the server.An example access point is shown in Figure 1.

Fig. 1. An example of an access point.

A mobile device (MU) is a device that is a mobile radio node that allows you to determine in real time the location of the person or other object to which it is attached. Depending on the type of device, it can perform additional functions, for example, sound transmission. An example of a mobile is shown in Figure 2.

Fig. 2. An example of a mobile device.

Alive cycle (alive cycle) - the period during which the device broadcasts data on its status.
Figure 3 shows the architecture of the RTLS system. In the area indicated by the black line are components that require testing as part of this task.

Fig. 3. High-level architecture RealTrac.

The internal load on the server is characterized by devices from the RealTrac wireless segment that interact with the server, their number and intensity of data exchange. Data is transmitted via the internal protocol - INCP.
The external load is a public API request - RTLSCP.

Formulation of the problem.

The task is to determine the stability of the server software during a long load of 2000 mobile devices with a polling cycle of 2 seconds, i.e., to check whether there will be freezes, crashes or errors in the software. It was also required to determine the consumption of processor resources and memory (max., Min., Average).

Initially, this task was solved manually, but quickly came the realization that this task should be automated as soon as possible.
Inside the department, the problem was divided into the following subtasks:
1. Determining the approach to testing and tools.
2. Writing configuration files for the internal load generator.
3. Implementation of the testing system.
4. Automation of running tests.

Testing Approach.

Due to the specifics of the problem, it was decided to implement its own small system for automated stability testing. The core of the test system is a controller application, which sends out and runs scripts to “subordinate” machines via ssh at certain points in time. Scripts perform two main functions: deploying the system to a remote machine and monitoring resources. Figure 4 shows the interaction diagram.

Fig. 4. The scheme of interaction of test servers.

The next task is to automate test cycles. To solve it, we did not reinvent the wheel and used our local build server. That is, the testing process itself will take place during assembly, and the assembly will take place according to a given schedule.

Internal load generation.

In order not to constantly use real devices for tests and to be able to load the server with any number of devices, the incptester application designed to emulate wireless segment devices - TD and MU was developed internally.
From the point of view of the tester, it is enough to know only that incptester is configured using the configuration file. The file has the following form: In the [general] block, the geographical coordinates of the center point are set, from which the local coordinates (x, y, z) will be counted. The [tracks] block is needed to describe the paths along which mobile devices will move. The trajectory is described by a sequence of points. In the [devices] block, the devices themselves are registered, which will emulate the behavior of real devices.

[general]

geo_lat=61.786838

geo_lng=34.353548

geo_alt=1


[tracks]

id=1,type=POLY,TF=0,VRT=(20:0:0)(21:0:30)(60)(40:0:0)(60)


[devices]

mac=CF0000000000,devtype=1,ip=127.0.1.1,cycle=30000,x=0.0,y=0.0,z=0

mac=C00000000001,devtype=2,ip=127.0.0.2,cycle=30000,x=5.0,y=0.0,z=0

mac=000000BAD001,devtype=4,cycle=2000,track=1

mac=000000BAD002,devtype=6,cycle=2000,track=1

In addition to the mac address, each device has a devtype, a parameter that determines the type of device (1, 2 - stationary APs and 3-6 - MU). Also, there is a cycle parameter that defines the polling cycle of devices. For stationary access points, you need to specify a specific position through the coordinates. For mobile devices, you can specify the path along which they will move.

As a result, to generate the required load, you need to create a configuration file for the required number of devices for incptester.
In order not to manually register 2000 devices in the config, a small bash script was written that adds the specified number of lines to the base file.

add_devices.sh

#!/bin/bashset -e
if [ "$#" -ne 2 ]
thenecho"Usage: ./${0} <base_config> <dev_num>"exit 1
fi
INCPTESTER_CONF_PATH=./${1}.conf
if [ ! -f ${INCPTESTER_CONF_PATH} ]; then
    cat ./incptester_geo-base.conf > ${INCPTESTER_CONF_PATH}fialias get_next_mac='python -c "import sys; print '{:012x}'.format(int(sys.argv[1], 16)+int(sys.argv[2], 16)).upper()"'
last_mac=$(tac ${INCPTESTER_CONF_PATH} | grep -m 1 . | grep -o -P "[A-z0-9]{12}")
for count in $(seq ${2}); do 
	next_mac=$(get_next_mac "0x${last_mac}""0x1")
    echo"mac=${next_mac},devtype=6,cycle=2000,track=2" >> ${INCPTESTER_CONF_PATH}
	last_mac=${next_mac}done

Used software.

The controller application was decided to implement in java. To describe the test scenarios in the program, we used the cucumber library and, accordingly, the gherkin language. We used gradle as a build tool. The assembly itself was built into the local Hudson server.

Since the system runs on Debian Linux, it is advisable to implement scripts for interacting with the OS on bash. This includes installing / removing deb packages and configs. To monitor current processes, we used the psutil package for python, with periodic uploading of values for consumed resources in csv.

The main scenario.

The scenario is divided into the following steps:
1. Remove previous packages from test servers.
2. Prepare configuration files for deb packages.
3. Copy configuration files and scripts to test servers.
4. Deploy the system on test servers.
5. Run incptester with the configuration on 2000 mobile devices on slave1.
6. Start the server with an internal load for a certain amount of time on slave1 with parallel monitoring of resources.
7. Run the application server on slave2, configured on the main server located on the slave with parallel monitoring of resources.

Implementation.

In this section, I briefly outline the main points of the testing system implementation.
The test script is easily converted to a Gherkin feature. In the steps that can be taken, you can specify the parameters that will be passed to the executable method. It looks something like this: For the script, we write a separate class LoadStabilityGeo. The class will contain methods that perform steps from a feature. An example with passing a parameter to a method. The parameter is parsed by regular expression.

@Load_stability_geo

 Feature: Load_stability_geo

 This test starts the large number of the devices and monitors the system resources


@Install

 Scenario: Instalation of RealTrac system in geo mode

 Given I delete the previous Realtrac-server from the both test-servers

 And I prepare all deb configs

 And I copy all configs and scripts to the test servers

 And I install the main Realtrac-server on the test-server and incptester and stop service rtlserm for geo configuration

 Then Run first part of the test with the inside load for 11520 steps

 Given I install the app server

 Then Run second part of the test with the inside load for 11520 steps

import rtls.test.utils.RTLSUtils;
import cucumber.api.java.en.And;
import cucumber.api.java.en.Then;
import cucumber.api.java.en.When;
publicclassLoadStabilityGeo{
	// some other methods@Then("^Run first part of the test with the inside load for (\\d+) steps")
	publicvoidFirst_Monitoring(int count_time)throws ScriptFaildException, Exception {
       		int times=0; 
       		double minutes; 
       		int check_time_min=10;        //each 10 minutes the resources is verified
        	monitoring_file=NameFileDataFormat("monitoring", "csv");
        	path_monitoring_file=" /home/"+user+"/TestResult/";
        	path_monitoring_file=path_monitoring_file+monitoring_file;        
        	minutes=0.5;
        	while (times <= count_time) {
            	    run_ssh_cmd("resource_get.py "+path_monitoring_file+" rtls", "main_server");       
            	    If (((times%check_time_min)==0) && times!=0){
                	checkResource("rtls", rtlscp_port, rtlscpip, check_time_min);
                	times=times+1;                
            	    }else{
                	Sleep_time(minutes);
                	times=times+1;
            	    }
               }
               System.out.println("The first part of the stress test in geo-mode is successfully finished");
     }
}

A separate RTLSUtils class was also written, containing static methods for working with scripts (execution / verification of the result) and other general methods. Example method for running a command on the OS:

publicclassRTLSUtils{
// some other methodspublicstaticvoidexecuteCommand(String command)throws ScriptFaildException {
        try {
            String line;
            System.out.println("Excecute " + command);
            String[] env = new String[]{"DEBIAN_FRONTEND=noninteractive"};
            Process p = Runtime.getRuntime().exec(command, env);
            BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()));
            BufferedReader error = new BufferedReader(new InputStreamReader(p.getErrorStream()));
            while ((line = input.readLine()) != null) {
                System.out.println(line);
            }
            input.close();
            while ((line = error.readLine()) != null) {
                System.out.println(line);
            }
            error.close();
            try {
                if (p.exitValue() != 0) {
                    thrownew ScriptFaildException(new Exception("error to execute command " + command));
                }
            } catch (IllegalThreadStateException ex) {
            }
        } catch (IOException ex) {
            thrownew ScriptFaildException(ex);
        }
    }
}

Now let's move on to the scripts. Bash scripts perform the simple function of removing and installing deb packages. An example of installing packages on a remote machine via apt-get.

install_deb.sh

#!/bin/bashset -e 
set -x
# Путь до текущего скрипта
SCRIPT_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
# Параметры (возможно вынести их в отдельный файл)
TEST_USER=user
TEST_SERVER_IP=192.168.1.2
# Префикс для команды установки
# Если это локальная машина, то префикс не нужен
if [ "${TEST_SERVER_IP}" == "127.0.0.1" ];
then
	COMMAND_PREFIX=
else
    COMMAND_PREFIX="ssh ${TEST_USER}@${TEST_SERVER_IP}"
fi
APT_CONFIG_DIR="/home/${TEST_USER}/apt-get/"
VERSION=$1
# Копирование deb конфиг файла
if [ "${TEST_SERVER_IP}" == "127.0.0.1" ];
then
	if [ ! -d ${APT_CONFIG_DIR} ];
	then
		mkdir ${APT_CONFIG_DIR}
	fi
	cp ${SCRIPT_PATH}/debconf.dat $APT_CONFIG_DIR
else
	scp ${SCRIPT_PATH}/debconf.dat ${TEST_USER}@${TEST_SERVER_IP}:${APT_CONFIG_DIR}
fi
# Установка пакета
${COMMAND_PREFIX} sudo debconf-set-selections ${APT_CONFIG_DIR}debconf.dat
${COMMAND_PREFIX} sudo apt-get update && true
${COMMAND_PREFIX} sudo DEBIAN_FRONTEND=noninteractive apt-get install --force-yes -y some-package-${VERSION}

Python script that uploads process data to a csv file.

resources.sh

#!/usr/bin/python# -*- coding: utf-8 -*-# depends on;# sudo pip install psutil# Usage: python resources.py </path/to/file> <proc_name>import time
import datetime
import psutil
import sys
import csv
defconvert_bytes(bytes):'''
	Перевод занимаемой памяти из байт в читаемую строку
	:param bytes:
	:return:Строка
	'''
	bytes = float(bytes)
	if bytes >= 1099511627776:
		terabytes = bytes / 1099511627776
		size = '%.2fT' % terabytes
	elif bytes >= 1073741824:
		gigabytes = bytes / 1073741824
		size = '%.2fG' % gigabytes
	elif bytes >= 1048576:
		megabytes = bytes / 1048576
		size = '%.2fM' % megabytes
	elif bytes >= 1024:
		kilobytes = bytes / 1024
		size = '%.2fK' % kilobytes
	else:
		size = '%.2fb' % bytes
	return size
defget_string(proc, proc_name):'''
	Формирование строки с замерами.
	:param proc: Объект процесса
	:param proc_name: Имя процесса
	:return: Строка
	'''
	data_time = datetime.datetime.fromtimestamp(time.time()).strftime("%Y-%m-%d %H:%M:%S")
	ram = convert_bytes(proc.memory_info().rss)
	ram_percent = round(proc.memory_percent(),2)
	cpu = proc.cpu_percent(interval=1)
	wrt_str = ("{0} {1} {2} {3} {4} {5}".format(data_time, proc_name, proc.pid, ram, ram_percent, cpu))
	return wrt_str
if __name__ == "__main__":
	if (len(sys.argv) == 3):
		pathresult_log = sys.argv[1]
		target_proc = sys.argv[2]
		# Открываем файлы на записьif (target_proc=="rtls"):
			res_log_rtls = open(pathresult_log, 'a')
			writer_rtls_log = csv.writer(res_log_rtls)
		elif (target_proc == "rtlsapp"):
			res_log_app = open(pathresult_log, 'a')
			writer_app_log = csv.writer(res_log_app)
		elif (target_proc == "all"):
			res_log_rtls = open(pathresult_log, 'a')
			res_log_app = open(pathresult_log, 'a')
			writer_rtls_log = csv.writer(res_log_rtls)
			writer_app_log = csv.writer(res_log_app)
		else:
			print ("Types of the test are not correct")
			sys.exit(1)
		proc_rtls = 0
		proc_rtlsapp = 0try:
			# Выбираем интерисующие нас процессы
			procs = [p for p in psutil.process_iter()]
			for proc in procs:
				if (proc.name() == 'java'and proc.username() == 'rtlsadmin'):
					proc_rtls = proc
				elif (proc.name() == 'node'and proc.username() == 'rtlsapp'):
					proc_rtlsapp = proc
		except psutil.NoSuchProcess:
			passelse:
			# Записываем данные в файлif (target_proc == "rtls"or target_proc == "all"):
				if (proc_rtls != 0):
					proc_name = "rtls-server"
					str_rtls = get_string(proc_rtls, proc_name)
					writer_rtls_log.writerow(str_rtls.split())
			elif (target_proc == "rtlsapp"or target_proc == "all"):
				if (proc_rtlsapp != 0):
					proc_name = "rtls-app"
					str_rtlsapp = get_string(proc_rtlsapp, proc_name)
					writer_app_log.writerow(str_rtlsapp.split())
		finally:
			# Закрываем файлыif (target_proc == "rtls"or target_proc == "all"):
				res_log_rtls.close()
			elif (target_proc == "rtlsapp"or target_proc == "all"):
				res_log_app.close()
	else:
		print("Input parameters are not correct")
		sys.exit(1)

Results.

Test results are reflected in csv files. Example:

2016-03-04,11: 03: 55, rtls-server, 30237,1.29G, 32.71,167.8
2016-03-04,11: 04: 27, rtls-server, 30237,1.33G, 33.63,166.9
2016-03-04,11: 04: 59, rtls-server, 30237,1.34G, 34.0,172.8

Here, each line contains the date, time, process name, process identifier, percentage of memory used, percentage of processor load.
With this data, it is already possible to analyze and build graphs. In addition, it is convenient to send the received data to some monitoring service with a graphic dashboard, for example, to Grafana.

That's all. In the future, we plan to tell in more detail how the external load process is implemented, as well as about testing the server in the aggregate of internal and external load.
Author: Nikita Davydovsky

Tags: