
Dome PTZ Intelligence: Auto Patrol, Target Selection, and Tracking
Automation of the control system of the dome PTZ camera (PTZ camera) is an interesting and relevant task. With the concentration of situational centers and the introduction of video analytics, there is a need for intelligent algorithms that allow not only to analyze video from stationary (fixed) cameras, but also to aim a robotic camera at a target without operator intervention. The delay introduced by the digital video encoding and decoding subsystem limits the ability to remotely track a target using a PTZ camera and strengthens the need for local tracking automation. Our Habru post contains an overview of the main tasks of intellectualizing PTZ cameras, approaches to their solution and proposals on the market.
Consider the main tasks that are solved during the automation of the PTZ camera control system:
As part of the patrol function, the PTZ camera cyclically “bypasses” the observation preset set by the operator, stops at each position for a predetermined time and broadcasts the video with the selected zoom. This function is standard and is built into almost all models of dome PTZ cameras. The advantage of patrolling on assumptions is the ability to cover a large area and get images in each position with good detail. The disadvantages of the function are the presence of a blind zone in all positions except the current one and a constant change in the background of the scene, which complicates the analysis of the video by the analyst and the operator. In patrol mode, it is difficult to recognize slow scene changes over a short interval while the camera is in each position. If the operator directs the camera to a certain position, then the events occurring in other positions,
These shortcomings can be eliminated by installing panoramic fixed cameras that completely cover the protected area. Then the PTZ camera is used exclusively for obtaining a detailed image of targets detected with the help of surveillance cameras. The service life of the PTZ camera is also increased due to the fact that its mechanical load is reduced.
The sources of signal for automatic target selection can be: a) a surveillance fixed camera used in parallel with the dome; b) the dome camera in patrol mode; c) other sensors, for example, radio wave or vibration sensors of the perimeter system. The video signal from a television or thermal imaging camera is processed by video analytics, which detects targets and determines their location for pointing a PTZ camera without operator intervention. An example of installation, the data approach implemented is presented in Fig. 1. If several surveillance cameras with overlapping coverage areas are used, then multi-channel (multi-camera) video analytics is desirable. Especially important is multi-channel video analytics with frequent targets. Repeated target detection by each camera will lead to inefficient use of PTZ cameras and tracking failures, which will complicate retrospective analysis of the archive.
In the case when several targets are in the field of view of the surveillance system and the number of PTZ cameras is limited, it is necessary to distribute tasks between PTZ cameras in the optimal way in terms of their importance. The algorithm can calculate the priority of the target taking into account several criteria, such as: a) the location of the target (proximity to the guarded line or the most important object); b) the tracking time for the object (for example, each target can be accompanied by a PTZ camera for at least 10 seconds, after which it is possible to switch to another target); c) the classification of human behavior (for example, the behavior of "loitering in the zone" may have a higher priority than "entering the zone"). All found targets are put in a prioritized queue for subsequent processing by an intelligent video surveillance system.
The algorithm should pick targets from the prioritized queue in order of importance and distribute the targets between the available PTZ cameras, taking into account the relative positions of the targets and the available cameras. An operator may interfere with the operation of the algorithm, giving commands to the PTZ camera using the joystick or software interface (Fig. 4). In this case, the algorithm should use other PTZ cameras to track targets that were left unattended by the operator. On complex objects, it is necessary to use three-dimensional models of the protected object and camera coverage areas .
In the simplest case, the guidance algorithm can be implemented using a multi-zone motion detector for the surveillance camera: the frame is divided into many zones, each of which is associated with PTZ camera prepositions. When the motion detector in the zone is triggered (Fig. 2), the PTZ camera is transferred to the corresponding presumption (Fig. 4). The more zones are set during setup, the greater the magnification can be obtained on the PTZ camera. The disadvantage of this approach is unstable operation in the presence of several targets and limitations in the accuracy of guidance associated with the selected assumptions of the PTZ camera.
In an object with a large observation space and a large number of cameras, it is recommended that the coordinates of the survey camera be converted into the coordinate system of the PTZ camera without dividing frames into zones (Fig. 3.4).
Better guidance can be obtained using professional video analytics . Communication between the surveillance and the controlled camera is established through the global coordinate system of the real world, to which all cameras are attached . The accuracy of the conversion from a two-dimensional frame coordinate system to three-dimensional space of the real world limits the approximation of a PTZ camera, since in the case of a conversion error, at high magnification, the object may be out of sight. Therefore, special requirements are imposed on the video analytics of the surveillance camera: a high-quality localization (segmentation) of the target and high-quality calibration are necessary to relate its coordinates to the PTZ camera.
After the PTZ camera is aimed at the target, it is desirable to use tracking algorithms to display and record an entire fragment of the target’s video, followed by a PTZ camera. In the process of setting up the tracking algorithm, you have to find a compromise between the degree of increase (and, therefore, detail) of the target and the frequency of the PTZ camera offsets. The stronger the magnification, the more often you have to move the camera.
Common PTZ cameras do not allow you to smoothly rotate the camera at varying speeds. When stepping, the position of the PTZ camera image “twitches” and smudges. Therefore, a good tracking algorithm should minimize the number of camera offsets for a given magnification. The tracking algorithm should work correctly in the case of temporary mutual overlapping of targets, for example, if people go towards each other (seevideo demo and slides about the algorithm ).
PTZ tracking of a target can be carried out in three ways: a) using a PTZ camera (self-tracking); b) using a surveillance camera (external tracking) and c) in a hybrid manner. Each of the methods has its advantages and disadvantages, which we will compare in a separate publication. The self-tracking algorithm is convenient in the case when the operator sets the target manually, and the surveillance camera is missing or does not see the target. The external tracking algorithm works more stably when there are several targets. For objects of a single visible size, tracking algorithms on a moving camera work worse than on a fixed camera, because in the latter case, the algorithm can better adapt to a fixed background. In theory, the hybrid method should provide the most stable tracking in all situations, but it is not yet implemented in the systems we know
Tracking a target using a PTZ drive is a real-time delay sensitive task. If the total video delay in the IP network exceeds 500 ms (half a second), then neither the operator nor the server video analytics can effectively control the camera. As a rule, about 300 ms is input by the transmitting device (camera or encoder) and about 100 ms is input by the VMS-system decoding the video.
High-quality tracking of the object can be realized with local video processing before compression. In this case, the coordinates of the target can be calculated according to the survey or PTZ camera for 20-40 ms. Such a system can accompany fast-moving targets, such as a running person and a vehicle, at good magnification.
Starting with version 1.02, the ONVIF international standard allows building unified solutions for automatic and manual control of PTZ cameras. In particular, the standard describes commands for controlling and reading the position of a PTZ camera, a coordinate system, as well as the format for transmitting metadata about moving objects from a survey camera to a video management system (VMS) and / or other devices for controlling a PTZ camera.
The use of intelligent PTZ features in public places is limited by the capabilities of video analytics tracking. Today, there is no video analytics on the market that can accompany a person in a crowd without using a face detector on a surveillance camera. If the resolution and viewing angle of the surveillance camera allows the use of a face detector, it is possible to automate the guidance of the PTZ camera for more accurate face recognition and recording a detailed image. Moreover, it is necessary to implement a tracking system according to the face detector data in order to optimize the operation of the PTZ camera for the desired scenario, for example, to track one person or to quickly scan all faces in the field of view.
Most PTZ cameras on the market with Pelco D interfaces (for the RS422 / 485 serial interface) or ONVIF (for the IP network) do not have feedback from the control system, in particular, it is impossible to request the current camera position and set the camera in absolute coordinates . This restriction does not allow the use of a PTZ camera to track the coordinates of a survey camera.
The module Trassir ActiveDome company DSSL has a function of PTZ-tracking with an analytic transformation of coordinates. In the frame of the survey camera, an area is specified that, through the calibration procedure, creates a relationship between the coordinates and the PTZ camera. According to information from the developer, the number of surveillance cameras in a video surveillance system can be unlimited and is related to the size of the monitored zone. For example, in order to provide a 360 ° view, it is recommended to install 4 cameras and one PTZ camera.
In product IntellectiTV can implement PTZ tracking using a multi-zone motion detector for the surveillance camera without automating the calibration process. To do this, you must perform the following steps: 1) split the frame of the surveillance camera into many zones of motion detection; 2) program the appropriate assumptions on the PTZ camera; 3) write a script that will install the PTZ camera in the preset corresponding to the movement zone. For PTZ tracking in conditions of movement of two or more targets, it is necessary to implement more complex logic using a script or ActiveX component.
Our company is working on the implementation of PTZ tracking with a multi-zone motion detector and analytical coordinate conversion in an IP video serverMagicBox In the current firmware version of the device, the transfer of metadata with the coordinates of the targets and control of the PTZ drive is carried out in accordance with the international ONVIF standard, which allows implementing the external control logic of the PTZ camera. The ONVIF device manager application, with which Habr is already familiar , illustrates the interaction of the ONVIF client with a PTZ camera and video analytic service (Fig. 4).
Fig. 3. Target tracking using integrated video analytics. Transfer of 2D and 3D target coordinates in ONVIF metadata for automatic targeting of a PTZ camera. The letter M means that the target is moving. The letter S means that the target has stopped. The background of the target is moving (the leaves of the trees move).

Fig. 4. Manual and automatic PTZ camera control via ONVIF protocol via ONVIF Device Manager.
The technologies for automatic control of a PTZ robotic camera based on video analytics and other sensors are at an early stage of development. In the Russian market VMS-system DSSL companies, ITV, as well as stand-alone device MagicBox companies Aggregatorand Synesis, which automate the operation of a PTZ camera. It should be noted promising areas for improving these products: a) the implementation of algorithms for working with multiple goals using multiple PTZ cameras in a single space of surveillance cameras; b) the development of a semi-automatic mode, for example, when the operator begins to monitor one goal, the system should use free PTZ cameras to track other goals; c) simplification of the initial setup (calibration) of the system and optimization of the user interface for working in automatic and semi-automatic PTZ tracking modes.
![]() | ![]() |
Fig. 1. Experimental setup for autonomous PTZ tracking: MagicBox video analytic device, Pelco PTZ camera and CNB surveillance camera. | Fig. 2.Presentations of a PTZ camera controlled by a zone motion detector. |
Automation Tasks
Consider the main tasks that are solved during the automation of the PTZ camera control system:
1. Auto Patrol
As part of the patrol function, the PTZ camera cyclically “bypasses” the observation preset set by the operator, stops at each position for a predetermined time and broadcasts the video with the selected zoom. This function is standard and is built into almost all models of dome PTZ cameras. The advantage of patrolling on assumptions is the ability to cover a large area and get images in each position with good detail. The disadvantages of the function are the presence of a blind zone in all positions except the current one and a constant change in the background of the scene, which complicates the analysis of the video by the analyst and the operator. In patrol mode, it is difficult to recognize slow scene changes over a short interval while the camera is in each position. If the operator directs the camera to a certain position, then the events occurring in other positions,
These shortcomings can be eliminated by installing panoramic fixed cameras that completely cover the protected area. Then the PTZ camera is used exclusively for obtaining a detailed image of targets detected with the help of surveillance cameras. The service life of the PTZ camera is also increased due to the fact that its mechanical load is reduced.
2. Auto target selection for PTZ tracking
The sources of signal for automatic target selection can be: a) a surveillance fixed camera used in parallel with the dome; b) the dome camera in patrol mode; c) other sensors, for example, radio wave or vibration sensors of the perimeter system. The video signal from a television or thermal imaging camera is processed by video analytics, which detects targets and determines their location for pointing a PTZ camera without operator intervention. An example of installation, the data approach implemented is presented in Fig. 1. If several surveillance cameras with overlapping coverage areas are used, then multi-channel (multi-camera) video analytics is desirable. Especially important is multi-channel video analytics with frequent targets. Repeated target detection by each camera will lead to inefficient use of PTZ cameras and tracking failures, which will complicate retrospective analysis of the archive.
3. Automatic prioritization for detail and tracking
In the case when several targets are in the field of view of the surveillance system and the number of PTZ cameras is limited, it is necessary to distribute tasks between PTZ cameras in the optimal way in terms of their importance. The algorithm can calculate the priority of the target taking into account several criteria, such as: a) the location of the target (proximity to the guarded line or the most important object); b) the tracking time for the object (for example, each target can be accompanied by a PTZ camera for at least 10 seconds, after which it is possible to switch to another target); c) the classification of human behavior (for example, the behavior of "loitering in the zone" may have a higher priority than "entering the zone"). All found targets are put in a prioritized queue for subsequent processing by an intelligent video surveillance system.
4. Auto PTZ camera selection
The algorithm should pick targets from the prioritized queue in order of importance and distribute the targets between the available PTZ cameras, taking into account the relative positions of the targets and the available cameras. An operator may interfere with the operation of the algorithm, giving commands to the PTZ camera using the joystick or software interface (Fig. 4). In this case, the algorithm should use other PTZ cameras to track targets that were left unattended by the operator. On complex objects, it is necessary to use three-dimensional models of the protected object and camera coverage areas .
5. Auto-pointing PTZ camera
In the simplest case, the guidance algorithm can be implemented using a multi-zone motion detector for the surveillance camera: the frame is divided into many zones, each of which is associated with PTZ camera prepositions. When the motion detector in the zone is triggered (Fig. 2), the PTZ camera is transferred to the corresponding presumption (Fig. 4). The more zones are set during setup, the greater the magnification can be obtained on the PTZ camera. The disadvantage of this approach is unstable operation in the presence of several targets and limitations in the accuracy of guidance associated with the selected assumptions of the PTZ camera.
In an object with a large observation space and a large number of cameras, it is recommended that the coordinates of the survey camera be converted into the coordinate system of the PTZ camera without dividing frames into zones (Fig. 3.4).
Better guidance can be obtained using professional video analytics . Communication between the surveillance and the controlled camera is established through the global coordinate system of the real world, to which all cameras are attached . The accuracy of the conversion from a two-dimensional frame coordinate system to three-dimensional space of the real world limits the approximation of a PTZ camera, since in the case of a conversion error, at high magnification, the object may be out of sight. Therefore, special requirements are imposed on the video analytics of the surveillance camera: a high-quality localization (segmentation) of the target and high-quality calibration are necessary to relate its coordinates to the PTZ camera.
6. Automatic target tracking
After the PTZ camera is aimed at the target, it is desirable to use tracking algorithms to display and record an entire fragment of the target’s video, followed by a PTZ camera. In the process of setting up the tracking algorithm, you have to find a compromise between the degree of increase (and, therefore, detail) of the target and the frequency of the PTZ camera offsets. The stronger the magnification, the more often you have to move the camera.
Common PTZ cameras do not allow you to smoothly rotate the camera at varying speeds. When stepping, the position of the PTZ camera image “twitches” and smudges. Therefore, a good tracking algorithm should minimize the number of camera offsets for a given magnification. The tracking algorithm should work correctly in the case of temporary mutual overlapping of targets, for example, if people go towards each other (seevideo demo and slides about the algorithm ).
PTZ tracking of a target can be carried out in three ways: a) using a PTZ camera (self-tracking); b) using a surveillance camera (external tracking) and c) in a hybrid manner. Each of the methods has its advantages and disadvantages, which we will compare in a separate publication. The self-tracking algorithm is convenient in the case when the operator sets the target manually, and the surveillance camera is missing or does not see the target. The external tracking algorithm works more stably when there are several targets. For objects of a single visible size, tracking algorithms on a moving camera work worse than on a fixed camera, because in the latter case, the algorithm can better adapt to a fixed background. In theory, the hybrid method should provide the most stable tracking in all situations, but it is not yet implemented in the systems we know
Delay effect
Tracking a target using a PTZ drive is a real-time delay sensitive task. If the total video delay in the IP network exceeds 500 ms (half a second), then neither the operator nor the server video analytics can effectively control the camera. As a rule, about 300 ms is input by the transmitting device (camera or encoder) and about 100 ms is input by the VMS-system decoding the video.
High-quality tracking of the object can be realized with local video processing before compression. In this case, the coordinates of the target can be calculated according to the survey or PTZ camera for 20-40 ms. Such a system can accompany fast-moving targets, such as a running person and a vehicle, at good magnification.
Standards Support
Starting with version 1.02, the ONVIF international standard allows building unified solutions for automatic and manual control of PTZ cameras. In particular, the standard describes commands for controlling and reading the position of a PTZ camera, a coordinate system, as well as the format for transmitting metadata about moving objects from a survey camera to a video management system (VMS) and / or other devices for controlling a PTZ camera.
Lively scenes
The use of intelligent PTZ features in public places is limited by the capabilities of video analytics tracking. Today, there is no video analytics on the market that can accompany a person in a crowd without using a face detector on a surveillance camera. If the resolution and viewing angle of the surveillance camera allows the use of a face detector, it is possible to automate the guidance of the PTZ camera for more accurate face recognition and recording a detailed image. Moreover, it is necessary to implement a tracking system according to the face detector data in order to optimize the operation of the PTZ camera for the desired scenario, for example, to track one person or to quickly scan all faces in the field of view.
Special requirements for PTZ camera
Most PTZ cameras on the market with Pelco D interfaces (for the RS422 / 485 serial interface) or ONVIF (for the IP network) do not have feedback from the control system, in particular, it is impossible to request the current camera position and set the camera in absolute coordinates . This restriction does not allow the use of a PTZ camera to track the coordinates of a survey camera.
Market Overview
The module Trassir ActiveDome company DSSL has a function of PTZ-tracking with an analytic transformation of coordinates. In the frame of the survey camera, an area is specified that, through the calibration procedure, creates a relationship between the coordinates and the PTZ camera. According to information from the developer, the number of surveillance cameras in a video surveillance system can be unlimited and is related to the size of the monitored zone. For example, in order to provide a 360 ° view, it is recommended to install 4 cameras and one PTZ camera.
In product IntellectiTV can implement PTZ tracking using a multi-zone motion detector for the surveillance camera without automating the calibration process. To do this, you must perform the following steps: 1) split the frame of the surveillance camera into many zones of motion detection; 2) program the appropriate assumptions on the PTZ camera; 3) write a script that will install the PTZ camera in the preset corresponding to the movement zone. For PTZ tracking in conditions of movement of two or more targets, it is necessary to implement more complex logic using a script or ActiveX component.
Our company is working on the implementation of PTZ tracking with a multi-zone motion detector and analytical coordinate conversion in an IP video serverMagicBox In the current firmware version of the device, the transfer of metadata with the coordinates of the targets and control of the PTZ drive is carried out in accordance with the international ONVIF standard, which allows implementing the external control logic of the PTZ camera. The ONVIF device manager application, with which Habr is already familiar , illustrates the interaction of the ONVIF client with a PTZ camera and video analytic service (Fig. 4).
Fig. 3. Target tracking using integrated video analytics. Transfer of 2D and 3D target coordinates in ONVIF metadata for automatic targeting of a PTZ camera. The letter M means that the target is moving. The letter S means that the target has stopped. The background of the target is moving (the leaves of the trees move).

Fig. 4. Manual and automatic PTZ camera control via ONVIF protocol via ONVIF Device Manager.
Conclusion
The technologies for automatic control of a PTZ robotic camera based on video analytics and other sensors are at an early stage of development. In the Russian market VMS-system DSSL companies, ITV, as well as stand-alone device MagicBox companies Aggregatorand Synesis, which automate the operation of a PTZ camera. It should be noted promising areas for improving these products: a) the implementation of algorithms for working with multiple goals using multiple PTZ cameras in a single space of surveillance cameras; b) the development of a semi-automatic mode, for example, when the operator begins to monitor one goal, the system should use free PTZ cameras to track other goals; c) simplification of the initial setup (calibration) of the system and optimization of the user interface for working in automatic and semi-automatic PTZ tracking modes.