
How we searched for a data leak in SimilarWeb
Good day.
It all started half a year ago. We are working as a small team on a project, the project has already been launched on the network and it has been working successfully for several months. Somehow I started talking about visit statistics, user referral sources, and the like. The managers sent me a link to the SimilarWeb page with our resource. What I saw puzzled me a lot. In addition to other information, the page contains information about subdomains that SimilarWeb found. Imagine my surprise when I saw in the top 5 subdomains the internal ones that are used only by employees and are not accessible from the outside (such as jira.mycomp.org, ci.mycomp.org, git.mycomp.org).
Only one thing came to mind: someone in the team had some kind of nastiness that merged data by the visited URLs. Part of the team works remotely, all have different operating systems and browsers. He talked with each individually, asked to scan the system with antivirus, requested a list of extensions used.
Google has posted several articles about SimilarWeb’s purchase of the Stylish extension. I put this application to myself and made sure that it really merges the data. How it works: when installing the extension, you agree to the terms of data collection (and at the moment the application is in the store and does not hide the fact that data will be collected for SimilarWeb). Further, when going to any page (even on https), the extension in the background starts sending data to the url h___s: //userstylesapi.com/tic/stats. It looks like this:

The e parameter in FormData contains data double-wrapped in Base64:
Thus, with each click, information is transmitted to the visited URLs.
They cleaned working and home computers, deleted the extension from those who had it and wrote in the instructions for the future. All that remained was to wait. Data for SimilarWeb is updated within one month.
However, two months passed, and the situation did not change. Domains continued to hang on the resource list. So not everyone was cleaned. We decided to calculate the "scammer" in another way. For each member of the team, a special URL of the following form was created: coder-124.mycomp.ru, coder-523.mycomp.ru, etc. They gave the task to go to this URL daily and make a few clicks, the process was monitored so that no one forgot. After a month of bullying the developers, we still got the fruits. One of the URLs was at the very bottom of the list. The target is found, it remains to understand how the data is merged.
The result was surprising, the Chrome extension poured the data ... But not Stylish ... As it turned out, the Frigate extension poured the data . When installed, the extension displays the following message:

Suppose ... Next, we looked at how it transmits this data:


When going to any page on two URLs (I wonder why on two), the following data is sent:

The e parameter in FormData contains data double-wrapped in Base64:
I do not think that all this data is needed to select a proxy server. And the mechanisms are very similar.
By the way, there is no such functionality in the friGate Light extension ...
Instead of a conclusion.
I can assume that if a second extension is found, then there will be a third and a fourth. Most likely, this method of collaboration between SimilarWeb and developers of browser extensions will develop further. I urge you to check your extensions (Chrome, Firefox - it doesn’t matter) and if you find something like that, write in the comments. It is interesting to know how deep the problem is.
And remember, big brother is always watching you :)
All the best.
It all started half a year ago. We are working as a small team on a project, the project has already been launched on the network and it has been working successfully for several months. Somehow I started talking about visit statistics, user referral sources, and the like. The managers sent me a link to the SimilarWeb page with our resource. What I saw puzzled me a lot. In addition to other information, the page contains information about subdomains that SimilarWeb found. Imagine my surprise when I saw in the top 5 subdomains the internal ones that are used only by employees and are not accessible from the outside (such as jira.mycomp.org, ci.mycomp.org, git.mycomp.org).
Only one thing came to mind: someone in the team had some kind of nastiness that merged data by the visited URLs. Part of the team works remotely, all have different operating systems and browsers. He talked with each individually, asked to scan the system with antivirus, requested a list of extensions used.
Google has posted several articles about SimilarWeb’s purchase of the Stylish extension. I put this application to myself and made sure that it really merges the data. How it works: when installing the extension, you agree to the terms of data collection (and at the moment the application is in the store and does not hide the fact that data will be collected for SimilarWeb). Further, when going to any page (even on https), the extension in the background starts sending data to the url h___s: //userstylesapi.com/tic/stats. It looks like this:

The e parameter in FormData contains data double-wrapped in Base64:
ZG0xMFBUTW1iR0YyUFRJeEpuZDJQVEVtWjNJOU1pNHdMamttY0hobFBURm5aamhwTjJnNU5qVTVOekZ4ZERob05tTTVhamc0T0hCME5DWnpiblU5Sm1kd1BXaDBkSEJ6SlROQkpUSkdKVEpHZFhObGNuTjBlV3hsY3k1dmNtY2xNa1p6ZEhsc1pYTWxNa1ppY205M2MyVWxNa1p1WlhkbGMzUXRjM1I1YkdWekptTm9QVGttWkdrOVlUTmxNMlV5WVRneA==
vmt=3&lav=21&wv=1&gr=2.0.9&pxe=1gf8i7h965971qt8h6c9j888pt4&snu=&gp=https%3A%2F%2Fuserstyles.org%2Fstyles%2Fbrowse%2Fnewest-styles&ch=9&di=a3e3e2a81
Thus, with each click, information is transmitted to the visited URLs.
They cleaned working and home computers, deleted the extension from those who had it and wrote in the instructions for the future. All that remained was to wait. Data for SimilarWeb is updated within one month.
However, two months passed, and the situation did not change. Domains continued to hang on the resource list. So not everyone was cleaned. We decided to calculate the "scammer" in another way. For each member of the team, a special URL of the following form was created: coder-124.mycomp.ru, coder-523.mycomp.ru, etc. They gave the task to go to this URL daily and make a few clicks, the process was monitored so that no one forgot. After a month of bullying the developers, we still got the fruits. One of the URLs was at the very bottom of the list. The target is found, it remains to understand how the data is merged.
The result was surprising, the Chrome extension poured the data ... But not Stylish ... As it turned out, the Frigate extension poured the data . When installed, the extension displays the following message:

Suppose ... Next, we looked at how it transmits this data:


When going to any page on two URLs (I wonder why on two), the following data is sent:

The e parameter in FormData contains data double-wrapped in Base64:
Y3oweE9ERTBKbTFrUFRJeEpuQnBaRDFzWW5keE1FeHBTVW8xZFhFeWFEY21jMlZ6Y3owMU56TXpNVFl6TWpVeU1EazJOemd3TURBbWMzVmlQV05vY205dFpTWnhQV2gwZEhCekpUTkJMeTltY21rdFoyRjBaUzV2Y21jdmNuVXZKbWh5WldabGNtVnlQV2gwZEhCekpUTkJMeTkzZDNjdVoyOXZaMnhsTG5KMUx5WndjbVYyUFdoMGRIQnpKVE5CTHk5bWNta3RaMkYwWlM1dmNtY3ZjblV2Sm5SdGRqMDBNREUxSm5SdFpqMHhMakU9
s=1814&md=21&pid=lbwq0LiIJ5uq2h7&sess=573316325209678000&sub=chrome&q=https%3A//fri-gate.org/ru/&hreferer=https%3A//www.google.ru/&prev=https%3A//fri-gate.org/ru/&tmv=4015&tmf=1.1
I do not think that all this data is needed to select a proxy server. And the mechanisms are very similar.
By the way, there is no such functionality in the friGate Light extension ...
Instead of a conclusion.
I can assume that if a second extension is found, then there will be a third and a fourth. Most likely, this method of collaboration between SimilarWeb and developers of browser extensions will develop further. I urge you to check your extensions (Chrome, Firefox - it doesn’t matter) and if you find something like that, write in the comments. It is interesting to know how deep the problem is.
And remember, big brother is always watching you :)
All the best.