EliteForce September 29, 2016 at 15:00

Artificial Intelligence in Excel predicts gold medal winner in men's tournament at Rio 2016

From the sandbox

Every time we use complex mathematical algorithms and modern machine learning methods, we set the task to get a trend, understand the internal dependencies, and ultimately make predictions. More accurate results can be obtained if the algorithm can be adapted to existing knowledge, to the existing process model. One of the directions in machine learning, which allows you to create and train models for obtaining predictions, is “generating (or Bayesian) modeling” ( as opposed to “discriminant” modeling , for example, neural networks) There are platforms for creating probabilistic models and working with them that have recently been referred to as probabilistic programming. You can read more about probabilistic programming in other articles on Habrahabr: “Probabilistic programming” , “Probabilistic programming - the key to artificial intelligence?” And “Probabilistic programming” .

More recently, the Invrea startup has appeared , which suggests using Excel as a probabilistic programming language: a probabilistic model can be created in Excel and predictions can be obtained there. Below is a translation of one of the articles from the startup site(translation made for educational purposes only). In the article, the authors consider an example of a "household" situation. They are interested in understanding who will win the tennis tournament at the 2016 Olympics. They make predictions about who is the most likely candidate to win. The article was written on August 7, during the games, after completing all the games of the first round.

***

Task description

“In our opinion, it is very important to make machine learning easy to use and accessible to everyone. It is necessary to eliminate as much as possible the need to tailor an existing question or task to the necessary framework for possible calculations. The authors present the Invrea Scripting plug-in (for Excel), which can be used to make decisions about current and regular events. To demonstrate this, the authors reproduced the tournament in the men's singles tournament in an Excel spreadsheet. Using the plugin, the probability of victory was determined for each of the players included in the Association of Professional Tennis Players (ATP), based on their rating in this organization.

Video demonstration (in English) of using the Excel plugin to predict the winner of the men's tennis tournament in Rio 2016:

Predictions for the winner of the gold medal after the first round. Calculated using the Invrea plugin:

In the men's single player tournament, 64 people participate, each of whom faces face to face with another player. The winner goes to the next round, the loser leaves the competition. This continues until the finals, where the winner is awarded a gold medal. The question is, who is most likely to receive gold? It would be nice to get the probability of victory for each player in the tournament. Tennis has a large share of uncertainty. The fact that Murray has a rating higher than that of Nishikori does not guarantee that Murray will go further. As with any sporting event, disappointments and surprises can happen at any time.

Fortunately, machine learning is designed to find ways to deal with this problem. Using it, we can take into account a certain amount of randomness in deciding whether Murray or Nishikori will win. Murray has a slightly higher probability according to the rating, but the possibility that Nishikori defeats is also fair.

Introducing the probabilistic model in a spreadsheet

The following is an Excel spreadsheet that the authors created to determine the chances of each player winning Rio. The file has two sheets: the first contains a list of all players, its rating and the logarithm of this rating. A random value was added to each of the ratings, because the number of points is not always crucial in the description of the player (see Fig. 1). For example, Del Potro has only 140 points in the ATP ranking, but this is mainly due to the short gaming period. His recent results lead us to believe that he is a better player than his rating shows. Randomness helps to take into account these small inconsistencies.

Auxiliary spreadsheet with a priori “strengths” of each of the players, based on the rating of the Association of Professional Tennis Players:

Fig. 1. The

second sheet contains the standings. Column C reflects the first round that you can find on any page describing the tournament. But you can also see that the entire further state of the tournament, including the winner, is also filled (see Fig. 2). How did it happen? If you update the spreadsheet (by pressing the F9 key), you may notice that all players in the second round and after it will change. In other words, cells that reflect who will go to the next round contain values calculated based on probability.

Basic spreadsheet to simulate match results:

Fig. 2.

But their random nature is based on a rule: imagine that you choose a random value close to the rating of player A and choose another random value close to the rating of player B. Sometimes these values will be lower than the real rating ... sometimes higher. In this case, the rule: the one with a higher random variable wins. Thus, having a higher ATP rating means that you have a higher chance of defeating your opponent, but you may have a bad day / you may suffer from injury and conditionally acquire a low random value. From here, things are repeated. The third round displays the same equality for players who have passed in the second round. Etc. That's why if you update the spreadsheet many times, different people will be declared winners of the tournament.

What Invrea's Scripting allows you to do is define these random cells using features like GAUSSIAN, and this plugin allows you to generate thousands of scripts automatically and display them. You can see what the distribution in each random cell is: who will win in the first round? In the second round? In the semifinals? In the finale? You can look at probabilities in any cell that interests you.

We generate and analyze the posterior distribution

The histogram that you see below (see Fig. 3) is a posterior probability calculated for each player’s victory, not knowing what happened at the end of the first round. The higher the bar, the more likely the victory of the corresponding player. Just glancing at her, we see that Djokovic has a pretty good chance. The only ones who can potentially stop him are Murray, Nadal and Nishikori (Federer does not participate in the competition). Thanks to this information, you may have more reason to say about your expectations for Djokovic's victory.

Fig. 3.

In fact, you can do even more interesting. As soon as the results are obtained at the end of the round, you can take this into account in the table using ACTUAL (a special function of Invrea). With this in mind, we can see the scenarios of who will win in the final, focusing on the results of previous rounds already obtained. For example, the authors used the results of all matches after the first round.

These included several unforeseen situations, including the fact that Djokovic lost to Del Potro (see Fig. 4).

Fig. 4.

After starting the plugin, taking into account the new data, you can see (see Fig. 5) a change in the probability distribution of who will win in the final: now there are fewer players, and the probability of Djokovic’s victory is now zero, while the results of Murray, Nadal and Nishikori rose after the first round.

Fig. 5.

In fact, the distribution of each random cell has changed, because the information on the results of the first round helped the plug-in better calculate who will be the most likely winner. Drawing conclusions on these results, we can assume the winner in the person of Murray or Nishikori. You can follow further, during the tournament, adding information about ongoing events. Once the results of the second round appear, you can add them using ACTUALS, and the predictions will become even better.

Also, there is a large amount of information that can be predicted during the tournament. Who is the most likely winner in the 4th quarter-finals (Figure 6)? Second semi-final (Fig. 7)? Looking at the histograms above, we can get answers to each of these questions.

Fig. 6.

Fig. 7.

Invrea “Scenarios” help with this kind of predictions, but that’s not all. The plugin can simulate uncertainty and predict something based on assumptions, new information and new data for business decisions, insurance, and a payment schedule. If it is possible to simulate your decision as a relationship between cells in an Excel spreadsheet, then there is enough chance that Invrea can help. Further, we will continue the series of articles related to the forecasting of certain events, using situations and tasks from other areas of our lives. ”

***

Afterword

As we mentioned earlier, the article was written on August 7, after the end of only the first round of the men's tournament. As we already know, the winner at Rio 2016 was Andy Murray, whose victory was predicted in the article with the highest probability (see Fig. 5 in translation).

If you are interested in this area and liked the translation, there are plans for translating other articles related to machine learning in general and probabilistic programming in particular. There are also ideas to talk about modern machine learning applications in educational and commercial projects.

Tags:

Artificial Intelligence in Excel predicts gold medal winner in men's tournament at Rio 2016

Task description

Introducing the probabilistic model in a spreadsheet

We generate and analyze the posterior distribution

Afterword

Also popular now: