# LSTM - ANN Dynamic Pricing in the Home Retail

It is no secret that machine learning methods began to penetrate everywhere in various business areas, optimizing, improving and even creating new business processes. One of the important areas is the issue of setting the price for the goods and here, with enough data, MO helps to do what was previously difficult to achieve - to restore the multifactor demand curve from the data. Thanks to the restored demand curve, it became possible to build dynamic pricing systems that allow price optimization depending on the pricing goal - to increase revenue or profit. This article is a compilation of my dissertation work, in which the LSTM-ANN dynamic pricing model was developed and tested in practice for 4 weeks for one of the home goods retailer's products.

I want to immediately note that in this article I will not disclose the name of the company in which the study was conducted (nevertheless, this is one of the companies from the list presented in the premises), instead I will simply call it Retailer.

There is a price leader in the home goods retail market - Leroy Merlin. Sales volumes of this network allow them to maintain a minimum price strategy for the entire product range, which leads to price pressure on other market players.

The revenue and profit of the main retailers in St. Petersburg as of December 31

, 2017. In this connection, Retailer uses a different pricing approach:

This approach is a combination of cost method of pricing and price orientation of competitors. However, it is not perfect - it does not directly take into account consumer demand.

Due to the fact that the dynamic pricing model takes into account many factors (demand, seasonality, promotions, prices of competitors), and also allows you to impose restrictions on the proposed price (for example, from the bottom - covering costs), potentially this system gets rid of all one-sidedness and disadvantages of other pricing methods.

For the study, the company provided data from January 2015 to July 2017 (920 days / 131 week). These data included:

In addition to this data, I also added calendar dummy variables:

Also, weather variables:

Directly analyzing the daily sales of goods, I found that:

Only about 30% of the goods were sold all the time, all other goods were either introduced for sale later than 2015, or were withdrawn from sale earlier than 2017, which led to a significant restriction on the choice of goods for research and price experiment. This also leads us to the fact that due to the constant change of products in the line of the store, it becomes difficult to create an integrated pricing pricing system, however, there are some ways to get around this problem, which will be discussed later.

To build a price recommendation system for a product for the next period of time based on a model predicting demand, I came up with the following scheme:

Since, having trained the model on data, we get a model that reconstructs a multifactor demand curve, supplying various prices of goods to the input, we will get prospective sales, depending on this price. Thus, we can optimize the price to achieve the desired result - to maximize the expected revenue or expected profit. It remains only to train a model that could well predict sales.

What did not work out

After selecting one of the products for research, I used XGBoost before going directly to the LSTM model.

I did this in the hope that XGBoost will help me to throw away a lot of unnecessary factors (this happens automatically), and the ones that we have left to use for the LSTM model. I used this approach consciously, because, in order to avoid unnecessary questions on the thesis defense, I wanted to get a strong and, at the same time, a simple justification of the choice of factors for the model, on the one hand, and on the other, the simplification of development. In addition, I received a ready-made, draft model on which one could quickly try out different ideas in the study. And after that, going to a final understanding of what will work and what does not, make the final LSTM model.

To understand the problem of forecasting, here is the daily sales schedule for the first selected product:

The entire time series of sales on the chart was divided into average sales for the period, so as not to disclose real values, but to keep the view.

In general, a lot of noise, while there are pronounced bursts - this is the conduct of promotions at the network level.

Since for me it was the first experience in building machine learning models, I had to spend quite a lot of time on various articles and documentation in order for me to eventually get something done.

The initial list of factors that presumably affect sales:

In total, it turned out 380 factors. (2.42 observations per factor). Thus, the problem of clipping is not significant factors was really high, but XGBoost helped to cope with this, significantly cutting the number of factors to 23 (40 observations per factor).

The best result I could achieve using greed search is as follows:

R ^ 2-adj = 0.4 on the test sample The

data were divided into a training and test sample without mixing (since this is a time series). As a metric, I used the R ^ 2 indicator adjusted deliberately, since the presentation of the final results of the work was to be carried out before the commission, including consisting of business representatives, therefore, it was used as the most famous and easy to understand.

The final results diminished my belief in success, because the result of R ^ 2-adj 0.4 meant only that the prediction system would not be able to predict demand the next day well enough, and the price recommendation would differ little from the “finger to the sky” system.

Additionally, I decided to check how much XGBoost will be effective for predicting daily sales for a group of products (in jokes) and predicting the number of checks in the whole network.

Sales by product group:

R ^ 2-adj = 0.71

Checks:

R ^ 2-adj = 0.86

I think the reason that the sales data for a particular product could not be predicted is clear from the graphs presented - noise. Individual sales of goods turned out to be too susceptible to randomness, so the regression construction method was not effective. At the same time, by aggregating the data, we removed the influence of randomness and obtained good predictive capabilities.

In order to finally make sure that it was a meaningless exercise to predict demand for one day ahead, I used the SARIMAX model (statsmodels package for python) for day sales:

Actually, the results are no different from those obtained using XGBoost, which suggests that the use of a complex model in this case is not justified.

At the same time, I also want to note that neither for XGBoost nor for SARIMAX were the weather factors significant.

The solution to the quality prediction problem was to aggregate the data into a weekly level. This reduced the influence of random factors, however, significantly reduced the amount of observed data: if the daily data was 920, then weekly only 131. The situation worsened by the fact that the number of factors remained almost unchanged (dummies were excluded on the days of the week), but the number of observations of the target variable decreased greatly.

In addition, my task was complicated by the fact that at that time, the company decided to change the product for which the experiment will be carried out using the model, so I had to develop a model from scratch.

The change of goods occurred on goods with pronounced seasonality:

Due to the transition to weekly sales, there was a logical question: is it adequate to use the LSTM model in general on such a small amount of data? I decided to find out in practice and, first of all, reduce the number of factors (even if it carried the potential damage in reducing relevant information). I threw out all the factors that are calculated on the basis of sales lags (average, RSI), weather factors (on the daily data, the weather did not matter, and the transfer to the weekly level, especially, lost some sense). After that, I traditionally used XGBoost to cut off other non-significant factors. Later, I additionally cut a few more factors, based on the LSTM model, simply by eliminating the factors one by one, teaching the model again and comparing the results.

The final list of factors is as follows:

Only 15 factors (9 observations per factor).

The final LSTM model was written using Keras, which included 2 hidden layers (25 and 20 neurons, respectively), and the activator - sigmoid.

Finite LSTM code using Keras:

Result:

The quality of the prediction on the test sample looked quite convincing on the metric, however, in my opinion, it did not reach the ideal, because, despite a fairly accurate definition of the average level of sales, bursts in individual weeks could be rather strongly deviated from the “average” level of sales, which gave a strong deviation of the sales forecast from reality on certain days (up to 50%). However, I have already used this model directly to experiment in practice.

It is also interesting to see what the restored demand curve for price looks like. To do this, I drove the model across the price range and, based on the predicted sales, built the demand curve:

Each week, the network provided data on sales for the previous week in St. Petersburg, as well as prices from competitors. Based on these data, I optimized the price to maximize the expected profit, saying the price that the network should set for the next week, which she did. This went on for 4 weeks (the deadline was agreed with the retailer).

Profit maximization was carried out with restrictions: the minimum price was the purchase price + fix. surcharge, the maximum price was limited by the price of the primer of the same manufacturer, only in a 10l pack.

The results of the experiment are presented in the tables below (all figures are divided by a certain value so as not to reveal the absolute values):

Sales

Prediction: Profit Prediction:

In order to assess the impact of the new pricing system on sales, I compared sales for the same period, only in previous years.

Summary results for 4 weeks:

As a result, we get a double picture: absolutely not realistic predictions on sales, but, at the same time, purely positive results on economic indicators (both in profit and revenue).

The explanation, in my opinion, is that in this case, the model, incorrectly predicting sales, nevertheless caught the right thought - the elasticity of the price for this product was below 1, which means that the price could be increased, without fear of falling sales, which we saw (sales in units remained at about the same level as last year and the year before).

But we should not forget that 4 weeks is a short-term period and the experiment was conducted on only one product. In the long run, overpricing of goods in the store usually leads to a drop in sales in the whole store. To confirm my guess on this score, I decided, using XGBoost, to check whether consumers have a “memory” of prices for previous periods (if in the past it was more expensive “in general” than its competitors, the consumer goes to competitors). Those. will the average price level of the group for the last 1, 3 and 6 months provide for sales by groups of goods?

Indeed, the guess was confirmed: one way or another, the average price level for previous periods affects sales in the current period. This means that it is not enough to optimize the price in the current period for a single product - you must also take into account the general price level in the long run. That, in general, leads to a situation where tactics (profit maximization now) contradicts strategy (competitive survival). This, however, is already better off to marketers.

Taking into account the results obtained and experience in my opinion, the most optimal, pricing system based on the sales forecast could be as follows:

Summing up the work done, I would like to say that for me, as an inexperienced person in the development in general and in the MO methods in particular, it was difficult, however, everything turned out to be feasible. It was also interesting to check for yourself how these methods are applicable in reality. After reading many articles before that, my eyes were burning from the fact that I would try to do everything on my own and I was in anticipation that I would get excellent results. The practice turned out to be harsh - a small amount of goods with a long sales history, noisy daily data, and misses in predicting sales volumes, the use of complex models is not always justified. Nevertheless, I got an unforgettable experience and learned what it means to use analytics in practice.

→ Based on the work done, I prepared a project in my repository

In the repository, you will find a dataset generated based on dependencies taken from real data, as well as a python script that allows you to conduct a virtual experiment on this generated data, suggesting you try your luck and outrun the model for the resulting profit by setting the price for the item. All you need is to download and run the script.

I hope my experience will help in defining the limits of using MO methods and will show that patience and perseverance allow you to achieve results, even if you are not a professional in some area.

I want to immediately note that in this article I will not disclose the name of the company in which the study was conducted (nevertheless, this is one of the companies from the list presented in the premises), instead I will simply call it Retailer.

## Prerequisites

There is a price leader in the home goods retail market - Leroy Merlin. Sales volumes of this network allow them to maintain a minimum price strategy for the entire product range, which leads to price pressure on other market players.

The revenue and profit of the main retailers in St. Petersburg as of December 31

, 2017. In this connection, Retailer uses a different pricing approach:

- The price is set at the level of the lowest of the competitors;
- Restriction on the price of the bottom: the purchase price + the minimum mark-up, reflecting the approximate cost per unit.

This approach is a combination of cost method of pricing and price orientation of competitors. However, it is not perfect - it does not directly take into account consumer demand.

Due to the fact that the dynamic pricing model takes into account many factors (demand, seasonality, promotions, prices of competitors), and also allows you to impose restrictions on the proposed price (for example, from the bottom - covering costs), potentially this system gets rid of all one-sidedness and disadvantages of other pricing methods.

## Data

For the study, the company provided data from January 2015 to July 2017 (920 days / 131 week). These data included:

- Daily sales, including weekends, for 470 products (16 product groups);
- Days of promotions in the store;
- Days in which discounts were provided for the goods;
- Prices for each of the 470 products;
- Daily data on the number of checks throughout the network in St. Petersburg;
- The prices of the main competitors for most of the 470 products (data were taken once a week).

In addition to this data, I also added calendar dummy variables:

- Season of the year (autumn / winter / summer / spring);
- Month;
- Quarter;
- Day of the week;
- Holidays;

Also, weather variables:

- Precipitations - dummy;
- Temperature;
- The temperature deviation from the average in the season.

Directly analyzing the daily sales of goods, I found that:

Only about 30% of the goods were sold all the time, all other goods were either introduced for sale later than 2015, or were withdrawn from sale earlier than 2017, which led to a significant restriction on the choice of goods for research and price experiment. This also leads us to the fact that due to the constant change of products in the line of the store, it becomes difficult to create an integrated pricing pricing system, however, there are some ways to get around this problem, which will be discussed later.

## Pricing system

To build a price recommendation system for a product for the next period of time based on a model predicting demand, I came up with the following scheme:

Since, having trained the model on data, we get a model that reconstructs a multifactor demand curve, supplying various prices of goods to the input, we will get prospective sales, depending on this price. Thus, we can optimize the price to achieve the desired result - to maximize the expected revenue or expected profit. It remains only to train a model that could well predict sales.

What did not work out

After selecting one of the products for research, I used XGBoost before going directly to the LSTM model.

I did this in the hope that XGBoost will help me to throw away a lot of unnecessary factors (this happens automatically), and the ones that we have left to use for the LSTM model. I used this approach consciously, because, in order to avoid unnecessary questions on the thesis defense, I wanted to get a strong and, at the same time, a simple justification of the choice of factors for the model, on the one hand, and on the other, the simplification of development. In addition, I received a ready-made, draft model on which one could quickly try out different ideas in the study. And after that, going to a final understanding of what will work and what does not, make the final LSTM model.

To understand the problem of forecasting, here is the daily sales schedule for the first selected product:

The entire time series of sales on the chart was divided into average sales for the period, so as not to disclose real values, but to keep the view.

In general, a lot of noise, while there are pronounced bursts - this is the conduct of promotions at the network level.

Since for me it was the first experience in building machine learning models, I had to spend quite a lot of time on various articles and documentation in order for me to eventually get something done.

The initial list of factors that presumably affect sales:

- Data on the daily sales of other products from this group, total sales in the group in pieces and the number of checks for all the stores of the chain in St. Petersburg with lags 1, 2, 3, 7, 14, 21, 28;
- Data on prices of other products from the group;
- The ratio of the price of the investigated product with the prices of other products from the group;
- The lowest price among all competitors (the data were taken once a week, and I made an assumption that such prices will be valid for the next week);
- The ratio of the price of the investigated product with the lowest price from competitors;
- Sales lags by group (in units);
- Simple average and RSI based on lags of sales of group products, total sales in the group and the number of checks.

In total, it turned out 380 factors. (2.42 observations per factor). Thus, the problem of clipping is not significant factors was really high, but XGBoost helped to cope with this, significantly cutting the number of factors to 23 (40 observations per factor).

The best result I could achieve using greed search is as follows:

R ^ 2-adj = 0.4 on the test sample The

data were divided into a training and test sample without mixing (since this is a time series). As a metric, I used the R ^ 2 indicator adjusted deliberately, since the presentation of the final results of the work was to be carried out before the commission, including consisting of business representatives, therefore, it was used as the most famous and easy to understand.

The final results diminished my belief in success, because the result of R ^ 2-adj 0.4 meant only that the prediction system would not be able to predict demand the next day well enough, and the price recommendation would differ little from the “finger to the sky” system.

Additionally, I decided to check how much XGBoost will be effective for predicting daily sales for a group of products (in jokes) and predicting the number of checks in the whole network.

Sales by product group:

R ^ 2-adj = 0.71

Checks:

R ^ 2-adj = 0.86

I think the reason that the sales data for a particular product could not be predicted is clear from the graphs presented - noise. Individual sales of goods turned out to be too susceptible to randomness, so the regression construction method was not effective. At the same time, by aggregating the data, we removed the influence of randomness and obtained good predictive capabilities.

In order to finally make sure that it was a meaningless exercise to predict demand for one day ahead, I used the SARIMAX model (statsmodels package for python) for day sales:

Actually, the results are no different from those obtained using XGBoost, which suggests that the use of a complex model in this case is not justified.

At the same time, I also want to note that neither for XGBoost nor for SARIMAX were the weather factors significant.

## Construction of the final model

The solution to the quality prediction problem was to aggregate the data into a weekly level. This reduced the influence of random factors, however, significantly reduced the amount of observed data: if the daily data was 920, then weekly only 131. The situation worsened by the fact that the number of factors remained almost unchanged (dummies were excluded on the days of the week), but the number of observations of the target variable decreased greatly.

In addition, my task was complicated by the fact that at that time, the company decided to change the product for which the experiment will be carried out using the model, so I had to develop a model from scratch.

The change of goods occurred on goods with pronounced seasonality:

Due to the transition to weekly sales, there was a logical question: is it adequate to use the LSTM model in general on such a small amount of data? I decided to find out in practice and, first of all, reduce the number of factors (even if it carried the potential damage in reducing relevant information). I threw out all the factors that are calculated on the basis of sales lags (average, RSI), weather factors (on the daily data, the weather did not matter, and the transfer to the weekly level, especially, lost some sense). After that, I traditionally used XGBoost to cut off other non-significant factors. Later, I additionally cut a few more factors, based on the LSTM model, simply by eliminating the factors one by one, teaching the model again and comparing the results.

The final list of factors is as follows:

- The ratio of price per kilogram of the investigated product and primer CERESIT ST 17 10 l .;
- The ratio of the price of the investigated product and the product and primer CERESIT ST 17 10 l;
- The ratio of the price of the investigated product and the primer EURO PRIMER 3 liters;
- The ratio of the price of the investigated product and the minimum price of competitors;
- Dummy variables on three network-level promotions;
- Dummy variable spring, summer, autumn seasons;
- Lags 1 - 5 week sales of the investigated product.

Only 15 factors (9 observations per factor).

The final LSTM model was written using Keras, which included 2 hidden layers (25 and 20 neurons, respectively), and the activator - sigmoid.

Finite LSTM code using Keras:

```
model = Sequential()
model.add(LSTM(25, return_sequences=True, input_shape=(1, trainX.shape[2])))
model.add(LSTM(20))
model.add(Dense(1, activation=’sigmoid’))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=40, batch_size=1, verbose=2)
model.save('LSTM_W.h5')
```

Result:

The quality of the prediction on the test sample looked quite convincing on the metric, however, in my opinion, it did not reach the ideal, because, despite a fairly accurate definition of the average level of sales, bursts in individual weeks could be rather strongly deviated from the “average” level of sales, which gave a strong deviation of the sales forecast from reality on certain days (up to 50%). However, I have already used this model directly to experiment in practice.

It is also interesting to see what the restored demand curve for price looks like. To do this, I drove the model across the price range and, based on the predicted sales, built the demand curve:

## Experiment

Each week, the network provided data on sales for the previous week in St. Petersburg, as well as prices from competitors. Based on these data, I optimized the price to maximize the expected profit, saying the price that the network should set for the next week, which she did. This went on for 4 weeks (the deadline was agreed with the retailer).

Profit maximization was carried out with restrictions: the minimum price was the purchase price + fix. surcharge, the maximum price was limited by the price of the primer of the same manufacturer, only in a 10l pack.

The results of the experiment are presented in the tables below (all figures are divided by a certain value so as not to reveal the absolute values):

Sales

Prediction: Profit Prediction:

In order to assess the impact of the new pricing system on sales, I compared sales for the same period, only in previous years.

Summary results for 4 weeks:

As a result, we get a double picture: absolutely not realistic predictions on sales, but, at the same time, purely positive results on economic indicators (both in profit and revenue).

The explanation, in my opinion, is that in this case, the model, incorrectly predicting sales, nevertheless caught the right thought - the elasticity of the price for this product was below 1, which means that the price could be increased, without fear of falling sales, which we saw (sales in units remained at about the same level as last year and the year before).

But we should not forget that 4 weeks is a short-term period and the experiment was conducted on only one product. In the long run, overpricing of goods in the store usually leads to a drop in sales in the whole store. To confirm my guess on this score, I decided, using XGBoost, to check whether consumers have a “memory” of prices for previous periods (if in the past it was more expensive “in general” than its competitors, the consumer goes to competitors). Those. will the average price level of the group for the last 1, 3 and 6 months provide for sales by groups of goods?

Indeed, the guess was confirmed: one way or another, the average price level for previous periods affects sales in the current period. This means that it is not enough to optimize the price in the current period for a single product - you must also take into account the general price level in the long run. That, in general, leads to a situation where tactics (profit maximization now) contradicts strategy (competitive survival). This, however, is already better off to marketers.

Taking into account the results obtained and experience in my opinion, the most optimal, pricing system based on the sales forecast could be as follows:

- Rising from the commodity nomenclature to the half of the step above is to carry out cluster analysis and group conditional screwdrivers by similarity and forecast sales and set the price not for a single screwdriver, but for this subgroup - so we avoid the problem of permanently removing and adding commodity nomenclatures.
- Carrying out price optimization in the complex - not only for individual subgroups of goods, but also taking into account long-term effects. For this, you can use a model that predicts sales in the whole network, good, it turned out to be impressively accurate even in day sales.

Summing up the work done, I would like to say that for me, as an inexperienced person in the development in general and in the MO methods in particular, it was difficult, however, everything turned out to be feasible. It was also interesting to check for yourself how these methods are applicable in reality. After reading many articles before that, my eyes were burning from the fact that I would try to do everything on my own and I was in anticipation that I would get excellent results. The practice turned out to be harsh - a small amount of goods with a long sales history, noisy daily data, and misses in predicting sales volumes, the use of complex models is not always justified. Nevertheless, I got an unforgettable experience and learned what it means to use analytics in practice.

→ Based on the work done, I prepared a project in my repository

In the repository, you will find a dataset generated based on dependencies taken from real data, as well as a python script that allows you to conduct a virtual experiment on this generated data, suggesting you try your luck and outrun the model for the resulting profit by setting the price for the item. All you need is to download and run the script.

I hope my experience will help in defining the limits of using MO methods and will show that patience and perseverance allow you to achieve results, even if you are not a professional in some area.