How to prevent the algorithm from selling the bank
Hello, Habr! Our team in Moscow is developing an internal algorithmic trading platform. Today we would like to talk about the mechanisms that we add to our architecture to protect against possible failures.
The non-zero probability of errors in the code, even after the most thorough testing, and the code review is a fact that you need to come to terms with and take for granted. Therefore, when developing an architecture, it is always worthwhile to lay down protection mechanisms that will allow the system to function or shut down without harming itself when it starts to behave not in accordance with expectations. This is especially important in the financial sector where we work.
Three years ago, everyone had a story about the Knight Capital Group. As a result of a “successful” upgrade of their system, they lost about $ 460 million due to the fact that their trading system put up and bought 397 million shares of different companies at non-market prices. The report on the investigation of this event should probably lie on the table of every COO of any financial company - as a reminder of what the insufficient level of technical development of processes in the company and the lack of automatic protection systems can lead to.
The architecture of any trading system should have in one form or another a subsystem for controlling financial risks from trade. The KCG case in our internal jargon can be classified as an “out-of-control strategy." When designing, you need to understand that this is only one of the possibilities that can happen to your system. In addition to technological risks, of course, there is still a large set of various “human errors” that may result from carelessness or be intentional, caused by the desire for personal enrichment in people who will use your system. But in this article we only want to discuss possible protection mechanisms against technical risks associated with cases when the algorithm gets out of control and behaves differently than intended.
In our platform, trading strategies (or, in other words, trading algorithms) are launched within a certain container. The container contains 'circuit breaker' components that stand in the way from the trading algorithm to the outside world. The closest analogy from the physical world is fuses. The main purpose of these 'circuit breakers' is to make an automatic decision to disable the strategy in case of triggering the rules that are laid down in them. They have two states: “closed” - spend all the messages between the strategy and the outside world, and “open” - when they block any new applications (aka orders) from the strategy to the exchange. Moreover, in any state, they always transmit messages from the exchange into the strategy.
Returning to the case with KCG: they had various monitoring systems, but it took the support team more than 45 minutes to find and make a decision to disable the “broken” subsystems. In conditions of high-frequency trading during this time, a modern trading system is able to sell and buy all your assets hundreds of times. Therefore, decisions to stop the “suspicious” algorithm should be taken automatically.
The container in which the algorithm runs should ensure that the strategy cannot bypass this protection. You can add from practice that different teams should be involved in developing strategies and 'circuit breaker' components.
Each 'circuit breaker' is a simple rule that should limit the freedom of action of a controlled strategy. A typical rule may sound like this: "a strategy can send no more than 100 orders to the market for all the time." As soon as the strategy tries to send 101 orders to the market, the 'circuit breaker' will go into open state and stop transferring new orders from the strategy further to the market.
As soon as any rule is triggered, the following chain of events is triggered: a) the strategy receives a message that the 'circuit breaker' has switched to the open state and it must complete its work; b) all active orders are removed from the markets that were placed by this strategy; c) the trader receives a notification to his trading terminal about an error in the algo strategy and its forced stop; d) the same message is received by the support team, which should immediately begin an investigation of what happened.
Let's see which, in our opinion, 'circuit breakers' should be present in any trading system:
In general, this is the minimum set of rules that should be present in any system, in our opinion. But this list, of course, can be continued further.
It is also important to note once again that having a 'circuit breaker' in your system does not guarantee that there are no problems. This is only one of the lines of defense that you must build inside your trading platform. Errors can also sneak into the algo container and the 'circuit breaker' components. We will talk about how we deal with the technical risks of these possible errors in the following articles if you are interested in this.
The non-zero probability of errors in the code, even after the most thorough testing, and the code review is a fact that you need to come to terms with and take for granted. Therefore, when developing an architecture, it is always worthwhile to lay down protection mechanisms that will allow the system to function or shut down without harming itself when it starts to behave not in accordance with expectations. This is especially important in the financial sector where we work.
Three years ago, everyone had a story about the Knight Capital Group. As a result of a “successful” upgrade of their system, they lost about $ 460 million due to the fact that their trading system put up and bought 397 million shares of different companies at non-market prices. The report on the investigation of this event should probably lie on the table of every COO of any financial company - as a reminder of what the insufficient level of technical development of processes in the company and the lack of automatic protection systems can lead to.
The architecture of any trading system should have in one form or another a subsystem for controlling financial risks from trade. The KCG case in our internal jargon can be classified as an “out-of-control strategy." When designing, you need to understand that this is only one of the possibilities that can happen to your system. In addition to technological risks, of course, there is still a large set of various “human errors” that may result from carelessness or be intentional, caused by the desire for personal enrichment in people who will use your system. But in this article we only want to discuss possible protection mechanisms against technical risks associated with cases when the algorithm gets out of control and behaves differently than intended.
In our platform, trading strategies (or, in other words, trading algorithms) are launched within a certain container. The container contains 'circuit breaker' components that stand in the way from the trading algorithm to the outside world. The closest analogy from the physical world is fuses. The main purpose of these 'circuit breakers' is to make an automatic decision to disable the strategy in case of triggering the rules that are laid down in them. They have two states: “closed” - spend all the messages between the strategy and the outside world, and “open” - when they block any new applications (aka orders) from the strategy to the exchange. Moreover, in any state, they always transmit messages from the exchange into the strategy.
Returning to the case with KCG: they had various monitoring systems, but it took the support team more than 45 minutes to find and make a decision to disable the “broken” subsystems. In conditions of high-frequency trading during this time, a modern trading system is able to sell and buy all your assets hundreds of times. Therefore, decisions to stop the “suspicious” algorithm should be taken automatically.
The container in which the algorithm runs should ensure that the strategy cannot bypass this protection. You can add from practice that different teams should be involved in developing strategies and 'circuit breaker' components.
Each 'circuit breaker' is a simple rule that should limit the freedom of action of a controlled strategy. A typical rule may sound like this: "a strategy can send no more than 100 orders to the market for all the time." As soon as the strategy tries to send 101 orders to the market, the 'circuit breaker' will go into open state and stop transferring new orders from the strategy further to the market.
As soon as any rule is triggered, the following chain of events is triggered: a) the strategy receives a message that the 'circuit breaker' has switched to the open state and it must complete its work; b) all active orders are removed from the markets that were placed by this strategy; c) the trader receives a notification to his trading terminal about an error in the algo strategy and its forced stop; d) the same message is received by the support team, which should immediately begin an investigation of what happened.
Let's see which, in our opinion, 'circuit breakers' should be present in any trading system:
- The maximum number of orders sent to the market - for any strategy, there is a reasonable number of orders that it can send throughout its life. If this number is exceeded, then something went wrong.
- The maximum number of orders sent to the market during a certain period of time - no market likes when it is spammed. Even if your strategy does not directly harm your company, you can be punished by the exchange, because, for example, your strategy can place, cancel and re-place an order to buy or sell any paper. Exchanges do not like such orders, because they create a load, but do not lead to real transactions.
- The maximum market position that the strategy can open - the strategy may not exceed the maximum number of active orders and the maximum size of each individual order, but it should always have the maximum total size of all open and active orders that it can put on the market. If it exceeds this limit, then this is a sign that the strategy goes beyond the risk that is defined to it.
- The maximum number of open active orders in the market - the strategy may not exceed the maximum permissible risk for open positions, but too many active orders in the market may be a signal that something is going wrong;
- The maximum delay time for a response from the market / verification of receipt of confirmations for orders sent to the market - the algorithms send orders to the market, each order has its own life cycle. When a strategy sends a new order to the market, it must be confirmed by the other party in accordance with the exchange messaging protocols. This rule is responsible for verifying that the surrounding world behaves in accordance with the expectations of the algorithm. A strategy cannot work if it sends its orders to the market and does not receive the results of the execution of its orders. In this case, the error may not be within the strategy, but in the environment. But in any case, trading should be stopped until the causes of delays and errors in messages are clarified.
- Too good to be true - sometimes an algorithm may try to buy or sell an instrument at a price that is too good to be real. Usually, in order to implement such checks, you should have a source of prices that you would consider average for trading this asset on the market. If the strategy tries to buy / sell an asset at prices that go beyond the corridor you designated, this again means that something suspicious is happening and you need to stop trading until you can clearly say what caused it.
- Fat fingers - this check limits the size of the order, which the strategy can put in one order at a time to the market. The check is more likely to protect against traders if they launch a strategy with some very large orders to buy or sell assets.
- Dead-man switch - any algorithms are launched by traders, who are ultimately responsible for the financial result. The main rule is that the person should always observe the algorithm, monitor the open position, the financial result. He decides that the algorithm works within the framework of a given program or that something is going wrong. This check is designed to quickly deal with human negligence or forgetfulness, which can result in large financial losses for your bank. In our case, if the trader does not perform any active actions on the computer (pressing the keyboard, mouse) for a long time, a warning window is displayed. If it does not respond, then the UI closes the active connections to the algo container. And the algo container, already seeing that the session with the UI has closed,
In general, this is the minimum set of rules that should be present in any system, in our opinion. But this list, of course, can be continued further.
It is also important to note once again that having a 'circuit breaker' in your system does not guarantee that there are no problems. This is only one of the lines of defense that you must build inside your trading platform. Errors can also sneak into the algo container and the 'circuit breaker' components. We will talk about how we deal with the technical risks of these possible errors in the following articles if you are interested in this.