
How I Earn $ 500K in Machine Learning and High Frequency Trading - Part 2
- Transfer
From a translator: Continuing the translation of an article ( part 1 ), which grabbed my attention and appealed to the hawkers, about a guy who, using his technological skills, was able to earn half a million dollars in a year.

So, I had a framework that allowed me to test and optimize indicators. But I had to do something more - I needed a framework that would allow me to test and optimize the entire trading system as a whole; one in which I could send commands and open positions. In this case, I would be able to optimize the total profit and loss and - to a certain extent - the average profit and loss for one trading session.
It would not be easy to create such a framework - in a sense, it’s even impossible to simulate exactly, but I did everything I could. And here are some questions I had to face:
To improve the simulation of the execution of orders, I took the logs collected during real trading using the API, and compared them with the logs recorded during the trading simulation for the same time period. I was able to bring my simulator closer to a state very close to reality, and with respect to those moments that could not be accurately modeled, I tried to make sure that the output results corresponded to statistical data (for those metrics that I considered important).
Having a simulation model of placing orders, I was able to send commands to the exchange in simulation mode and track conditional gains and losses. But how will my system understand where and when to buy and sell?
Predicting price behavior was the starting point for the system, but the story did not end there. Next, I developed a scoring system for each of the 5 price levels for buying and selling. These levels included one level above the domestic price of demand (for orders to buy) and one level below the domestic price of supply (for orders to sell).
If the account at any price level is above a given threshold value, this means that at this level in my system there should be an active offer to buy / sell. If the score is below the threshold, then any active orders must be canceled. Under these conditions, it would not be uncommon for my system to suddenly put on the market an offer to buy, and then immediately cancel it (in fact, I tried to minimize the likelihood of such events occurring, since such a situation would be scary to any living person on the monitor screen annoying).
Accounts for different price levels were calculated based on the following factors:
Essentially, these factors determine the “safe” zone for buying / selling. In itself, predicting price behavior would be an inadequate way of assessing the situation, since it did not take into account the fact that, when I placed a purchase offer, I did not close the position automatically, because this only happens when someone actually sells me securities . In reality, the very fact that someone is selling something to me at a given price changes the probabilistic picture of trading.
All variables used in this step were subject to optimization. It was done in the same way that I optimized the variables of the indicators of price changes, except that in this case I optimized the variables according to the lower boundary of profit and loss.
When a person trades, they are often seriously affected by emotions and prejudices that can lead to suboptimal decisions. Of course, I did not want these prejudices to have any reflection in my code. Therefore, my system ignored some factors:
Since my algorithm made decisions equally regardless of at what stage of the development of events he made a deal and whether the position was currently long or short, he did open unfavorable positions from time to time and make unsuccessful deals for large sums of money (although there were successful deals for no less large amounts). However, do not assume that I have not done anything to manage risk.
I rigidly set the size of the maximum position at the level of 2 contracts per unit time, since during active trading days the size of the maximum position could occasionally increase. I also had a limit on the maximum amount of losses during the day to protect myself from any unexpected changes in the market, as well as from bugs in my own program. These limits were prescribed in the code, however, I was additionally safe, giving instructions to my broker. Taking these precautions, I subsequently did not experience any significant problems.
Six months have passed since the start of work on my program before I brought it to the point where it began to be profitable and I was able to test it in practice. Although, to be honest, most of this time I studied the programming language. As I worked to improve the program, I recorded rising profits in each of the next four months.
Every week I would have to re-train my program based on data collected from the previous 4 weeks. However, I found out that this upsets the balance between finding the latest behavioral market trends and ensuring that my algorithm receives enough information to develop meaningful behavior patterns. When the training began to take more and more time, I broke it so that it could be carried out by 8 virtual machines using the Amazon EC2 service. Then the results were combined on my local machine.
The highest point of my trading was October 2009, when I earned almost $ 100,000. Subsequently, I spent another 4 months trying to improve my program despite the fact that the profit was decreasing every month. Unfortunately, today it seems to me that I have tried all my best ideas, because everything that I used did not help me much.
Frustrated by the inability to improve the program and the lack of a sense of growth, I began to think about a new direction. I wrote letters to 6 different trading companies specializing in high-frequency trading, and asked if they would like to buy my program and hire me to work. Nobody answered. Then I got ideas for new startups that I would like to work on, so I completely abandoned this business.
Note : I posted this post on Hacker News, where it gained great popularity. I just want to say that I'm not trying to protect those who are now trying to do something similar on their own. You will need a team of very smart colleagues with a lot of skills to just try to compete with someone in the market. Even when I wrote my program, loners extremely rarely achieved success (but I heard about such).
At the top of the page [ in the original post - approx. translator] there is a comment in which such expressions as “manipulating statistics” are found, and they call me one of the “investor retailers” about whom real quanta [ quanta / quotes - in the Russian translation practice both versions of translation are used- approx. translator] they say that they "need to shoot." This is a very unfortunate commentary, simply far from reality. Meanwhile, there are more interesting reviews on my article.
Note 2 : I posted a list of answers to frequently asked questions that I received from traders who read this article.

Creating a complete trading simulator
So, I had a framework that allowed me to test and optimize indicators. But I had to do something more - I needed a framework that would allow me to test and optimize the entire trading system as a whole; one in which I could send commands and open positions. In this case, I would be able to optimize the total profit and loss and - to a certain extent - the average profit and loss for one trading session.
It would not be easy to create such a framework - in a sense, it’s even impossible to simulate exactly, but I did everything I could. And here are some questions I had to face:
- When a team goes to the market in the simulator I have to simulate a time lag. The fact that my system “saw” the offer does not mean that it can immediately buy it. The system will send a command, wait about 20 milliseconds, and only if the offer is still valid will it be considered as a closed deal. This is not entirely accurate, since the duration of a real lag is not always the same and is not recorded.
- When I place offers to buy or sell shares, I need to take into account the flow of execution of transactions (which the API provides) and use it to track when my order will be executed. To do this correctly, I must track the position of my team in the queue (the queue is formed using the first-in first-out system). And again, I could not do it exactly, but I modeled the system as close to reality as possible.
To improve the simulation of the execution of orders, I took the logs collected during real trading using the API, and compared them with the logs recorded during the trading simulation for the same time period. I was able to bring my simulator closer to a state very close to reality, and with respect to those moments that could not be accurately modeled, I tried to make sure that the output results corresponded to statistical data (for those metrics that I considered important).
Securing Profitable Trades
Having a simulation model of placing orders, I was able to send commands to the exchange in simulation mode and track conditional gains and losses. But how will my system understand where and when to buy and sell?
Predicting price behavior was the starting point for the system, but the story did not end there. Next, I developed a scoring system for each of the 5 price levels for buying and selling. These levels included one level above the domestic price of demand (for orders to buy) and one level below the domestic price of supply (for orders to sell).
If the account at any price level is above a given threshold value, this means that at this level in my system there should be an active offer to buy / sell. If the score is below the threshold, then any active orders must be canceled. Under these conditions, it would not be uncommon for my system to suddenly put on the market an offer to buy, and then immediately cancel it (in fact, I tried to minimize the likelihood of such events occurring, since such a situation would be scary to any living person on the monitor screen annoying).
Accounts for different price levels were calculated based on the following factors:
- Prediction of price behavior (which we discussed earlier)
- The price level in question (internal levels indicate a more likely price leap)
- The number of contracts before my order in the queue (the less the better)
- The number of contracts after my order in line (the more the better)
Essentially, these factors determine the “safe” zone for buying / selling. In itself, predicting price behavior would be an inadequate way of assessing the situation, since it did not take into account the fact that, when I placed a purchase offer, I did not close the position automatically, because this only happens when someone actually sells me securities . In reality, the very fact that someone is selling something to me at a given price changes the probabilistic picture of trading.
All variables used in this step were subject to optimization. It was done in the same way that I optimized the variables of the indicators of price changes, except that in this case I optimized the variables according to the lower boundary of profit and loss.
What my program ignored
When a person trades, they are often seriously affected by emotions and prejudices that can lead to suboptimal decisions. Of course, I did not want these prejudices to have any reflection in my code. Therefore, my system ignored some factors:
- Entry price for a position - In the office of a trading company, there is frequent talk about the price at which someone concludes long and short transactions, as if this should affect the adoption of such decisions in the future. Despite the fact that such data is of some importance in the framework of the risk reduction strategy, they have no relation to the further development of market events. Therefore, my program completely ignored this information. This is the same as ignoring the irreversible costs.
- Conclusion of a long / short transaction - As a rule, a human trader would have a special criterion that determines where to sell a long position and where to conclude a short transaction. However, from the point of view of my algorithm, there was no difference between the two. If my algorithm expected a drop in prices, then the sale was a logical step, regardless of whether the transaction was “long”, “short” or “flat”.
- “Doubling” Strategy - This is a generally accepted strategy according to which traders buy more shares in the event that trading is initially not in their favor. As a result, your average purchase price is reduced, which means that when (if) the stock price changes course, you will “beat off” your expenses as soon as possible. I think this is just a nightmare strategy, unless you're Warren Buffett. You are fooling yourself into thinking that everything is fine with you, because most of your transactions will burn out. The problem is that if you are out of luck, the loss will be staggering. Another consequence of this approach is that it becomes extremely difficult to determine if you really got a market advantage, or if you are just lucky. An important quality of my program was that I could track and confirm situations,
Management of risks
Since my algorithm made decisions equally regardless of at what stage of the development of events he made a deal and whether the position was currently long or short, he did open unfavorable positions from time to time and make unsuccessful deals for large sums of money (although there were successful deals for no less large amounts). However, do not assume that I have not done anything to manage risk.
I rigidly set the size of the maximum position at the level of 2 contracts per unit time, since during active trading days the size of the maximum position could occasionally increase. I also had a limit on the maximum amount of losses during the day to protect myself from any unexpected changes in the market, as well as from bugs in my own program. These limits were prescribed in the code, however, I was additionally safe, giving instructions to my broker. Taking these precautions, I subsequently did not experience any significant problems.
Work with the algorithm
Six months have passed since the start of work on my program before I brought it to the point where it began to be profitable and I was able to test it in practice. Although, to be honest, most of this time I studied the programming language. As I worked to improve the program, I recorded rising profits in each of the next four months.
Every week I would have to re-train my program based on data collected from the previous 4 weeks. However, I found out that this upsets the balance between finding the latest behavioral market trends and ensuring that my algorithm receives enough information to develop meaningful behavior patterns. When the training began to take more and more time, I broke it so that it could be carried out by 8 virtual machines using the Amazon EC2 service. Then the results were combined on my local machine.
The highest point of my trading was October 2009, when I earned almost $ 100,000. Subsequently, I spent another 4 months trying to improve my program despite the fact that the profit was decreasing every month. Unfortunately, today it seems to me that I have tried all my best ideas, because everything that I used did not help me much.
Frustrated by the inability to improve the program and the lack of a sense of growth, I began to think about a new direction. I wrote letters to 6 different trading companies specializing in high-frequency trading, and asked if they would like to buy my program and hire me to work. Nobody answered. Then I got ideas for new startups that I would like to work on, so I completely abandoned this business.
Note : I posted this post on Hacker News, where it gained great popularity. I just want to say that I'm not trying to protect those who are now trying to do something similar on their own. You will need a team of very smart colleagues with a lot of skills to just try to compete with someone in the market. Even when I wrote my program, loners extremely rarely achieved success (but I heard about such).
At the top of the page [ in the original post - approx. translator] there is a comment in which such expressions as “manipulating statistics” are found, and they call me one of the “investor retailers” about whom real quanta [ quanta / quotes - in the Russian translation practice both versions of translation are used- approx. translator] they say that they "need to shoot." This is a very unfortunate commentary, simply far from reality. Meanwhile, there are more interesting reviews on my article.
Note 2 : I posted a list of answers to frequently asked questions that I received from traders who read this article.