Experience in automating RTLS server stability testing under internal load

Published on May 30, 2016

Experience in automating RTLS server stability testing under internal load

    Introduction


    In this article, I will talk about how the RTL-Service quality control department performs automated testing of the stability of the RealTrac server while servicing a large number of mobile location devices. For further understanding, I suggest that you familiarize yourself with useful terminology:
    RealTrac RTLS server (server) - server software of the RealTrac system that interacts with the system hardware and calculates the location of devices.

    RealTrac application server (application server) - server software necessary for the operation of a web application, providing a programmatic interface for accessing the basic functions of the system.

    RealTrac Access Point(hereinafter referred to as AP) is a device designed to transfer data between mobile devices of the network and the system server. Access points are permanently installed on the object, their coordinates are entered on the client software card and recorded in a database on the system server. The AP can operate in gateway or relay mode. The mode is determined by the presence of a wired Ethernet connection to the network (gateway access point, ShTD) and the absence of such (repeater access point, RTD). Only the gateway communicates with the server.An example access point is shown in Figure 1.

    Fig. 1. An example of an access point.

    A mobile device (MU) is a device that is a mobile radio node that allows you to determine in real time the location of the person or other object to which it is attached. Depending on the type of device, it can perform additional functions, for example, sound transmission. An example of a mobile is shown in Figure 2.

    Fig. 2. An example of a mobile device.
    image

    Alive cycle (alive cycle) - the period during which the device broadcasts data on its status.
    Figure 3 shows the architecture of the RTLS system. In the area indicated by the black line are components that require testing as part of this task.

    Fig. 3. High-level architecture RealTrac.
    image
    The internal load on the server is characterized by devices from the RealTrac wireless segment that interact with the server, their number and intensity of data exchange. Data is transmitted via the internal protocol - INCP.
    The external load is a public API request - RTLSCP.

    Formulation of the problem.


    The task is to determine the stability of the server software during a long load of 2000 mobile devices with a polling cycle of 2 seconds, i.e., to check whether there will be freezes, crashes or errors in the software. It was also required to determine the consumption of processor resources and memory (max., Min., Average).

    Initially, this task was solved manually, but quickly came the realization that this task should be automated as soon as possible.
    Inside the department, the problem was divided into the following subtasks:
    1. Determining the approach to testing and tools.
    2. Writing configuration files for the internal load generator.
    3. Implementation of the testing system.
    4. Automation of running tests.

    Testing Approach.


    Due to the specifics of the problem, it was decided to implement its own small system for automated stability testing. The core of the test system is a controller application, which sends out and runs scripts to “subordinate” machines via ssh at certain points in time. Scripts perform two main functions: deploying the system to a remote machine and monitoring resources. Figure 4 shows the interaction diagram.

    Fig. 4. The scheme of interaction of test servers.
    image
    The next task is to automate test cycles. To solve it, we did not reinvent the wheel and used our local build server. That is, the testing process itself will take place during assembly, and the assembly will take place according to a given schedule.

    Internal load generation.


    In order not to constantly use real devices for tests and to be able to load the server with any number of devices, the incptester application designed to emulate wireless segment devices - TD and MU was developed internally.
    From the point of view of the tester, it is enough to know only that incptester is configured using the configuration file. The file has the following form: In the [general] block, the geographical coordinates of the center point are set, from which the local coordinates (x, y, z) will be counted. The [tracks] block is needed to describe the paths along which mobile devices will move. The trajectory is described by a sequence of points. In the [devices] block, the devices themselves are registered, which will emulate the behavior of real devices.

    [general]
    geo_lat=61.786838
    geo_lng=34.353548
    geo_alt=1

    [tracks]
    id=1,type=POLY,TF=0,VRT=(20:0:0)(21:0:30)(60)(40:0:0)(60)

    [devices]
    mac=CF0000000000,devtype=1,ip=127.0.1.1,cycle=30000,x=0.0,y=0.0,z=0
    mac=C00000000001,devtype=2,ip=127.0.0.2,cycle=30000,x=5.0,y=0.0,z=0
    mac=000000BAD001,devtype=4,cycle=2000,track=1
    mac=000000BAD002,devtype=6,cycle=2000,track=1




    In addition to the mac address, each device has a devtype, a parameter that determines the type of device (1, 2 - stationary APs and 3-6 - MU). Also, there is a cycle parameter that defines the polling cycle of devices. For stationary access points, you need to specify a specific position through the coordinates. For mobile devices, you can specify the path along which they will move.

    As a result, to generate the required load, you need to create a configuration file for the required number of devices for incptester.
    In order not to manually register 2000 devices in the config, a small bash script was written that adds the specified number of lines to the base file.
    add_devices.sh
    #!/bin/bash
    set -e
    if [ "$#" -ne 2 ]
    then
      echo "Usage: ./${0} <base_config> <dev_num>"
      exit 1
    fi
    INCPTESTER_CONF_PATH=./${1}.conf
    if [ ! -f ${INCPTESTER_CONF_PATH} ]; then
        cat ./incptester_geo-base.conf > ${INCPTESTER_CONF_PATH}
    fi
    alias get_next_mac='python -c "import sys; print '{:012x}'.format(int(sys.argv[1], 16)+int(sys.argv[2], 16)).upper()"'
    last_mac=$(tac ${INCPTESTER_CONF_PATH} | grep -m 1 . | grep -o -P "[A-z0-9]{12}")
    for count in $(seq ${2}); do 
    	next_mac=$(get_next_mac "0x${last_mac}" "0x1")
        echo "mac=${next_mac},devtype=6,cycle=2000,track=2" >> ${INCPTESTER_CONF_PATH}
    	last_mac=${next_mac}
    done
    


    Used software.


    The controller application was decided to implement in java. To describe the test scenarios in the program, we used the cucumber library and, accordingly, the gherkin language. We used gradle as a build tool. The assembly itself was built into the local Hudson server.

    Since the system runs on Debian Linux, it is advisable to implement scripts for interacting with the OS on bash. This includes installing / removing deb packages and configs. To monitor current processes, we used the psutil package for python, with periodic uploading of values ​​for consumed resources in csv.

    The main scenario.


    The scenario is divided into the following steps:
    1. Remove previous packages from test servers.
    2. Prepare configuration files for deb packages.
    3. Copy configuration files and scripts to test servers.
    4. Deploy the system on test servers.
    5. Run incptester with the configuration on 2000 mobile devices on slave1.
    6. Start the server with an internal load for a certain amount of time on slave1 with parallel monitoring of resources.
    7. Run the application server on slave2, configured on the main server located on the slave with parallel monitoring of resources.

    Implementation.


    In this section, I briefly outline the main points of the testing system implementation.
    The test script is easily converted to a Gherkin feature. In the steps that can be taken, you can specify the parameters that will be passed to the executable method. It looks something like this: For the script, we write a separate class LoadStabilityGeo. The class will contain methods that perform steps from a feature. An example with passing a parameter to a method. The parameter is parsed by regular expression.
    @Load_stability_geo
    Feature: Load_stability_geo
    This test starts the large number of the devices and monitors the system resources

    @Install
    Scenario: Instalation of RealTrac system in geo mode
    Given I delete the previous Realtrac-server from the both test-servers
    And I prepare all deb configs
    And I copy all configs and scripts to the test servers
    And I install the main Realtrac-server on the test-server and incptester and stop service rtlserm for geo configuration
    Then Run first part of the test with the inside load for 11520 steps
    Given I install the app server
    Then Run second part of the test with the inside load for 11520 steps




    import rtls.test.utils.RTLSUtils;
    import cucumber.api.java.en.And;
    import cucumber.api.java.en.Then;
    import cucumber.api.java.en.When;
    public class LoadStabilityGeo {
    	// some other methods
      	@Then("^Run first part of the test with the inside load for (\\d+) steps")
    	public void First_Monitoring(int count_time) throws ScriptFaildException, Exception {
           		int times=0; 
           		double minutes; 
           		int check_time_min=10;        //each 10 minutes the resources is verified
            	monitoring_file=NameFileDataFormat("monitoring", "csv");
            	path_monitoring_file=" /home/"+user+"/TestResult/";
            	path_monitoring_file=path_monitoring_file+monitoring_file;        
            	minutes=0.5;
            	while (times <= count_time) {
                	    run_ssh_cmd("resource_get.py "+path_monitoring_file+" rtls", "main_server");       
                	    If (((times%check_time_min)==0) && times!=0){
                    	checkResource("rtls", rtlscp_port, rtlscpip, check_time_min);
                    	times=times+1;                
                	    }else{
                    	Sleep_time(minutes);
                    	times=times+1;
                	    }
                   }
                   System.out.println("The first part of the stress test in geo-mode is successfully finished");
         }
    }
    


    A separate RTLSUtils class was also written, containing static methods for working with scripts (execution / verification of the result) and other general methods. Example method for running a command on the OS:

    public class RTLSUtils {
    // some other methods
    public static void executeCommand(String command) throws ScriptFaildException {
            try {
                String line;
                System.out.println("Excecute " + command);
                String[] env = new String[]{"DEBIAN_FRONTEND=noninteractive"};
                Process p = Runtime.getRuntime().exec(command, env);
                BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()));
                BufferedReader error = new BufferedReader(new InputStreamReader(p.getErrorStream()));
                while ((line = input.readLine()) != null) {
                    System.out.println(line);
                }
                input.close();
                while ((line = error.readLine()) != null) {
                    System.out.println(line);
                }
                error.close();
                try {
                    if (p.exitValue() != 0) {
                        throw new ScriptFaildException(new Exception("error to execute command " + command));
                    }
                } catch (IllegalThreadStateException ex) {
                }
            } catch (IOException ex) {
                throw new ScriptFaildException(ex);
            }
        }
    }
    


    Now let's move on to the scripts. Bash scripts perform the simple function of removing and installing deb packages. An example of installing packages on a remote machine via apt-get.
    install_deb.sh
    #!/bin/bash
    set -e 
    set -x
    # Путь до текущего скрипта
    SCRIPT_PATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
    # Параметры (возможно вынести их в отдельный файл)
    TEST_USER=user
    TEST_SERVER_IP=192.168.1.2
    # Префикс для команды установки
    # Если это локальная машина, то префикс не нужен
    if [ "${TEST_SERVER_IP}" == "127.0.0.1" ];
    then
    	COMMAND_PREFIX=
    else
        COMMAND_PREFIX="ssh ${TEST_USER}@${TEST_SERVER_IP}"
    fi
    APT_CONFIG_DIR="/home/${TEST_USER}/apt-get/"
    VERSION=$1
    # Копирование deb конфиг файла
    if [ "${TEST_SERVER_IP}" == "127.0.0.1" ];
    then
    	if [ ! -d ${APT_CONFIG_DIR} ];
    	then
    		mkdir ${APT_CONFIG_DIR}
    	fi
    	cp ${SCRIPT_PATH}/debconf.dat $APT_CONFIG_DIR
    else
    	scp ${SCRIPT_PATH}/debconf.dat ${TEST_USER}@${TEST_SERVER_IP}:${APT_CONFIG_DIR}
    fi
    # Установка пакета
    ${COMMAND_PREFIX} sudo debconf-set-selections ${APT_CONFIG_DIR}debconf.dat
    ${COMMAND_PREFIX} sudo apt-get update && true
    ${COMMAND_PREFIX} sudo DEBIAN_FRONTEND=noninteractive apt-get install --force-yes -y some-package-${VERSION}
    


    Python script that uploads process data to a csv file.
    resources.sh
    #!/usr/bin/python
    # -*- coding: utf-8 -*-
    # depends on;
    # sudo pip install psutil
    # Usage: python resources.py </path/to/file> <proc_name>
    import time
    import datetime
    import psutil
    import sys
    import csv
    def convert_bytes(bytes):
    	'''
    	Перевод занимаемой памяти из байт в читаемую строку
    	:param bytes:
    	:return:Строка
    	'''
    	bytes = float(bytes)
    	if bytes >= 1099511627776:
    		terabytes = bytes / 1099511627776
    		size = '%.2fT' % terabytes
    	elif bytes >= 1073741824:
    		gigabytes = bytes / 1073741824
    		size = '%.2fG' % gigabytes
    	elif bytes >= 1048576:
    		megabytes = bytes / 1048576
    		size = '%.2fM' % megabytes
    	elif bytes >= 1024:
    		kilobytes = bytes / 1024
    		size = '%.2fK' % kilobytes
    	else:
    		size = '%.2fb' % bytes
    	return size
    def get_string(proc, proc_name):
    	'''
    	Формирование строки с замерами.
    	:param proc: Объект процесса
    	:param proc_name: Имя процесса
    	:return: Строка
    	'''
    	data_time = datetime.datetime.fromtimestamp(time.time()).strftime("%Y-%m-%d %H:%M:%S")
    	ram = convert_bytes(proc.memory_info().rss)
    	ram_percent = round(proc.memory_percent(),2)
    	cpu = proc.cpu_percent(interval=1)
    	wrt_str = ("{0} {1} {2} {3} {4} {5}".format(data_time, proc_name, proc.pid, ram, ram_percent, cpu))
    	return wrt_str
    if __name__ == "__main__":
    	if (len(sys.argv) == 3):
    		pathresult_log = sys.argv[1]
    		target_proc = sys.argv[2]
    		# Открываем файлы на запись
    		if (target_proc=="rtls"):
    			res_log_rtls = open(pathresult_log, 'a')
    			writer_rtls_log = csv.writer(res_log_rtls)
    		elif (target_proc == "rtlsapp"):
    			res_log_app = open(pathresult_log, 'a')
    			writer_app_log = csv.writer(res_log_app)
    		elif (target_proc == "all"):
    			res_log_rtls = open(pathresult_log, 'a')
    			res_log_app = open(pathresult_log, 'a')
    			writer_rtls_log = csv.writer(res_log_rtls)
    			writer_app_log = csv.writer(res_log_app)
    		else:
    			print ("Types of the test are not correct")
    			sys.exit(1)
    		proc_rtls = 0
    		proc_rtlsapp = 0
    		try:
    			# Выбираем интерисующие нас процессы
    			procs = [p for p in psutil.process_iter()]
    			for proc in procs:
    				if (proc.name() == 'java' and proc.username() == 'rtlsadmin'):
    					proc_rtls = proc
    				elif (proc.name() == 'node' and proc.username() == 'rtlsapp'):
    					proc_rtlsapp = proc
    		except psutil.NoSuchProcess:
    			pass
    		else:
    			# Записываем данные в файл
    			if (target_proc == "rtls" or target_proc == "all"):
    				if (proc_rtls != 0):
    					proc_name = "rtls-server"
    					str_rtls = get_string(proc_rtls, proc_name)
    					writer_rtls_log.writerow(str_rtls.split())
    			elif (target_proc == "rtlsapp" or target_proc == "all"):
    				if (proc_rtlsapp != 0):
    					proc_name = "rtls-app"
    					str_rtlsapp = get_string(proc_rtlsapp, proc_name)
    					writer_app_log.writerow(str_rtlsapp.split())
    		finally:
    			# Закрываем файлы
    			if (target_proc == "rtls" or target_proc == "all"):
    				res_log_rtls.close()
    			elif (target_proc == "rtlsapp" or target_proc == "all"):
    				res_log_app.close()
    	else:
    		print("Input parameters are not correct")
    		sys.exit(1)
    



    Results.


    Test results are reflected in csv files. Example:

    2016-03-04,11: 03: 55, rtls-server, 30237,1.29G, 32.71,167.8
    2016-03-04,11: 04: 27, rtls-server, 30237,1.33G, 33.63,166.9
    2016-03-04,11: 04: 59, rtls-server, 30237,1.34G, 34.0,172.8

    Here, each line contains the date, time, process name, process identifier, percentage of memory used, percentage of processor load.
    With this data, it is already possible to analyze and build graphs. In addition, it is convenient to send the received data to some monitoring service with a graphic dashboard, for example, to Grafana.

    That's all. In the future, we plan to tell in more detail how the external load process is implemented, as well as about testing the server in the aggregate of internal and external load.
    Author: Nikita Davydovsky