# We use data science to determine the life cycle of the client

Hi, Habr! I present to you the translation of my article "Understanding the Customer Lifetime Value with Data Science" .

Customer relationships are important for each company and play a key role in business growth. One of the most important metrics in this area is customer lifetime value (hereinafter LTV) - the prediction of net income associated with all future customer relationships. The longer customers continue to use the company's products, increasing profits, the higher their LTV.

There are many marketing articles about how important LTVs and customer segmentation are. But as a Data Scientist, I'm more interested in formulas and I want to understand how the model actually works. How to predict LTV using only 3 signs? In this post, I will show some of the models that are used for marketing customer segmentation and explain the math on which they are based. There will be many formulas, but do not worry: everything is ready in the Python libraries. The purpose of this blog is to show how math does all the work.

### Beta-geometric / negative binomial model for determining the likelihood that a client is “alive”

Consider this example [from the online service for ordering trips (taxi) around the city]: a user registered 1 month ago, made 4 trips and the last trip took place 20 days ago. Based only on this data, this model can predict the likelihood that a client will be active for a certain period of time (as shown in the graph), as well as the number of transactions in the future (which is the basis for understanding the value of the client throughout his “life” - customer relationship and company).

The model provides a direct guide to action for the business: to take marketing measures against the user when his likelihood of activity drops below a certain level in order to prevent him from leaving.

This model was proposed by Fader, Hardie and Lee and is called the Beta Geometric / Negative Binomial distribution model (BG / NBD).

BG / NBD model has the following properties:

When a user is active, the number of his transactions for the period t is described by the Poisson distribution with the transaction parameter λ .

Poisson distribution helps predict events by using data on how often events occurred in the past. For example, if a user made an average of 2 trips per week ( on the chart below), then the probability that he will make 3 orders next week is equal to 0.18.

1. The heterogeneity of the transaction parameter among users (which means how customers differ from each other in consumer behavior) has a Gamma distribution with parameters r (form) and α (scale) .

Gamma distribution is suitable for processes with a waiting time between events with a Poisson distribution (in our case for the transaction parameter λ ). For example, consider a user who makes an average of 2 transactions per week. In this case, the probability that the waiting time before the user makes 3 purchases will be more than 4 weeks is equal to the area on the graph to the right of the vertical dotted line (under the blue distribution line) - 0.13.

1. Users can become inactive after any transaction with probability p , and their exit point (when they become inactive) is distributed between Geometrical purchases.

Geometric distribution is similar to Bernoulli outcomes and is used to model the number of outcomes before (and including) the first successful outcome. If for some userthen its probability of being inactive after 3 transactions is 0.12 (the blue line on the chart).

1. Heterogeneity (variations between users) in the probability of leaving has Beta distribution with form parameters α and β .

The beta distribution is best suited for representing probability probability distributions — a case where we do not know the probability beforehand, but we have some reasonable a priori preconditions described by α and β (Mat. Expectation Beta.

For the previous example with a user whose prior probability of leaving is 0.2, the orange line on the graph with  and describes the probability density function for the probability of user exit.

1. The transaction parameter and the probability of withdrawal are independently distributed between users.

Mathematical notation for user attributes X :



Where  - number of transactions over a period of timeand - the time of the last purchase.
Based only on these signs, the model predicts the future consumer prerequisites of users:
 - probability  transactions for the period  in future,
 - the expected number of transactions per period for a user with a specific behavior.

Now we can find these two main indicators. Without going into details, I will show the final formulas (more calculations in the articles).

Chance to be active:

Expected number of transactions:

Where  - hyper geometric Gauss function

### Gamma-gamma model for assessing LTV

Up to this point, we have used only the frequency and recentity of customer purchases. But in addition to this, we can apply the monetary component of his transactions. Add new data to our example: the user made these 4 trips at a price of 10, 12, 8, 15. The gamma-gamma model helps predict the most likely value of a transaction in the future.
I summarize everything together, now we have all the elements to determine the client's LTV:

LTV = expected number of transactions  tranzakia price  margin

where the first element is from the BG / NB model, the second is from the Gamma-gamma model, and the margin is set by the business.

Mathematical notation for gamma-gamma models:
User committed value transactions  and  - the observed average value of the transaction.
 - the hidden average value of the transaction, and what we are interested in -  - the expected monetary value of the user, based on his buying behavior.

Properties Gamma-gamma model: The
monetary value of the user's transactions is random and is within their average transaction values.

The average transaction value varies among users, but does not vary for a particular user over time.

The average transaction value has a gamma distribution among users.

The articles describe the derivation of the formula in detail through several more Gamma distributions. The result is:

where p is the form parameter and v is the scale parameter gamma distribution for transactions shape parameter and the scale parameter for the gamma distribution v (assumption of the model that p is constant - the coefficients of variation at the individual level are the same for users). To find the parameters of the model, we can use the maximum likelihood method.

We are done with math and now we can rate LTV users. But what about the accuracy of this model?

### Model Accuracy Assessment

The traditional approach proposes to divide the data into two groups - part for training, part for test. In the articles, the authors show that their approach works well. I also tried these models on real data and also got similar results.

The graph shows the distribution of real and predicted transactions for the data from the test group: the error here is 2.8%.

### How to apply

As I said at the beginning, all models are already implemented. For example, the Python “ lifetimes ” library contains all the functions and metrics necessary to define an LTV. Detailed written documentation contains many examples and explanations. There are also examples of sql queries to receive data in the required format. So you can get to work in just a few minutes.

### Conclusion

In this post, I showed in detail how LTV users can be assessed using only a few signs.

I want to note that sometimes you can move away from the frequently used gradient boosted trees and try other approaches that have a comparable level of accuracy. Statistical training can still be used in practice and can help a business to better understand customers.