
Personalization in E-Commerce
Hello, Habr!
Today we are starting a series of articles on how we build the Retail Rocket service . For nearly three years of work, we have put together a solid technological stack, have become disillusioned with a large number of “trendy” technologies and have built a very complex system.
In short, Retail Rocket is a platform for multi-channel personalization of an online store based on Big Data. Our service analyzes the behavior of visitors to the online store, identifies the needs and at the right time shows them the offers that are interesting to them on the site, in email and display campaigns, increasing the online store’s revenue due to increased conversion, average check and the frequency of repeat purchases.
With this article, we are opening the Retail Rocket Engineering Blog ( Marketing Blog).we lead almost two years) with a story about the approaches used in the field of data analysis and a short list of technologies used. To everything described in the article we came iteratively and in the following articles we will try to describe in detail our path in each of the areas.
A few numbers that briefly describe our service:
More than 70 processing servers (mainly Hetzner).
About 100 million unique users (unique cookies) per month.
360,000 external requests per minute (average).
35 man-years invested in development.
10 engineers (developers, analysts, system administrators).
The essence of Retail Rocket is identifying the needs of a store visitor by analyzing the behavior and matrix of a store. For the formation of personal recommendations, we initially necessarily needed a mathematical foundation that could be easily scaled. Here is an almost complete list of approaches we use today:
For each of these topics, you can write a series of articles or even books :) I’m sure that someday we will tell you in detail how we implemented the subsystem for calculating personal recommendations (user-item) that works in real time, but for now briefly talk about the technologies we use for this.
For machine learning, we use Spark based on the Hadoop Yarn platform - this is a cluster computing system that is best suited for our current tasks.
Currently, we have almost completely translated the entire data analysis system into Spark using the functional programming language Scala. Before that, we wrote a lot in Pig, Hive, Python, and Java. Of the native Hadoop components, we have Apache Flume for data delivery, a distributed Machine Learning Mahout library, and an Oozie task scheduler.
Jenkins was chosen as the centralized solution for launching the periodic tasks of calculating recommendations (at the time of writing this article, a little less than 100). Despite the fact that this is a rather strange application of such a tool, we were satisfied with it for a year of work.
By the way, we have a repository on GitHub , where our team supports several projects:
Almost everything that the user sees is processed on win-machines with the IIS web server, the code is written in C #, Asp.Net MVC.
All data is stored and distributed in three DBMSs: Redis, MongoDB, PostgreSQL.
When we need to ensure the interaction of distributed components, for example, when calculating a user segment by User-Agent to profile the audience, Thrift is used. And in order for the various subsystems to receive the data stream for this from online stores, the Flume transport mentioned above is used.
In development, our team adheres to the methodology of continuous delivery of new functionality to customers (today more than 500 stores are connected to us). To do this, we use the Git + GitLab + TeamCity technology chain with the passage of unit tests, acceptance tests and code review. This approach is the minimum production standard that allows
us to provide the specified quality of the product and the deployment of production with zero downtime.
In this article, we tried to introduce you a little to the technological cuisine of Retail Rocket . We have a small plan for those that we wanted to cover on our blog, and which we believe will help the community with engineering solutions that we have spent more than one day on.
We will also be happy to hear from Habr’s readers about what personalization issues are of most interest. Be sure to consider your wishes in the following articles!
Today we are starting a series of articles on how we build the Retail Rocket service . For nearly three years of work, we have put together a solid technological stack, have become disillusioned with a large number of “trendy” technologies and have built a very complex system.
In short, Retail Rocket is a platform for multi-channel personalization of an online store based on Big Data. Our service analyzes the behavior of visitors to the online store, identifies the needs and at the right time shows them the offers that are interesting to them on the site, in email and display campaigns, increasing the online store’s revenue due to increased conversion, average check and the frequency of repeat purchases.
With this article, we are opening the Retail Rocket Engineering Blog ( Marketing Blog).we lead almost two years) with a story about the approaches used in the field of data analysis and a short list of technologies used. To everything described in the article we came iteratively and in the following articles we will try to describe in detail our path in each of the areas.
A few numbers that briefly describe our service:
More than 70 processing servers (mainly Hetzner).
About 100 million unique users (unique cookies) per month.
360,000 external requests per minute (average).
35 man-years invested in development.
10 engineers (developers, analysts, system administrators).
Data Analysis Approaches
The essence of Retail Rocket is identifying the needs of a store visitor by analyzing the behavior and matrix of a store. For the formation of personal recommendations, we initially necessarily needed a mathematical foundation that could be easily scaled. Here is an almost complete list of approaches we use today:
- Content Filtering
- Collaborative filtering.
- Predictive models based on machine learning and Markov chains.
- Bayesian statistics.
For each of these topics, you can write a series of articles or even books :) I’m sure that someday we will tell you in detail how we implemented the subsystem for calculating personal recommendations (user-item) that works in real time, but for now briefly talk about the technologies we use for this.
Analytic platform

Currently, we have almost completely translated the entire data analysis system into Spark using the functional programming language Scala. Before that, we wrote a lot in Pig, Hive, Python, and Java. Of the native Hadoop components, we have Apache Flume for data delivery, a distributed Machine Learning Mahout library, and an Oozie task scheduler.
Jenkins was chosen as the centralized solution for launching the periodic tasks of calculating recommendations (at the time of writing this article, a little less than 100). Despite the fact that this is a rather strange application of such a tool, we were satisfied with it for a year of work.
By the way, we have a repository on GitHub , where our team supports several projects:
- Engine for A / B tests in JavaScript.
- Spark MultiTool library on Scala.
- Scripts for deploying a Hadoop cluster using Puppet.
Frontend

All data is stored and distributed in three DBMSs: Redis, MongoDB, PostgreSQL.
When we need to ensure the interaction of distributed components, for example, when calculating a user segment by User-Agent to profile the audience, Thrift is used. And in order for the various subsystems to receive the data stream for this from online stores, the Flume transport mentioned above is used.
Development process

us to provide the specified quality of the product and the deployment of production with zero downtime.
What will we share
In this article, we tried to introduce you a little to the technological cuisine of Retail Rocket . We have a small plan for those that we wanted to cover on our blog, and which we believe will help the community with engineering solutions that we have spent more than one day on.
We will also be happy to hear from Habr’s readers about what personalization issues are of most interest. Be sure to consider your wishes in the following articles!