The random () function of googlobot works absolutely deterministically

Original author: Tom Anthony
  • Transfer
I did some experiments on how Googlebot parses and renders JavaScript, and stumbled upon a few interesting things. The first is that a function Math.random()in Googlebot produces completely deterministic series of numbers. I wrote a small script that uses this bug to accurately identify the guglobot:


Source

The first call Math.random()from the guglobot will always have a result 0,14881141134537756, the second call will always return 0,19426893815398216. The script at the link above simply uses this information to identify the Google bot, although it will slightly obfuscate its actions so that they do not look too arbitrary.

Google Crawling


Imagine the amount of work that Google needs to do to get around the web, but still run all the scripts. Here you can not do without plentiful optimizations, and I believe that deterministic random numbers are implemented for the following reasons:

  1. Speed.
  2. Better security.
  3. Predictability - a googlobot can be sure that the page will be displayed the same at every visit.

Speeding up time ...


Googlebot also runs JavaScript with an accelerated clock, which is quite logical. Why really wait 5 seconds if you are a bot? So Google actually starts the timer at a much faster pace. If you create a simple script with a ticker and run the Fetch & Render function in the Google Search Console, the script will execute almost instantly, but the result will look like this:



The second date is the date from the future! Marty McFly could be proud.

When did it start?


I wondered if Google’s random number generator was being updated, but a number search 0,14881141134537756returned more than 18,000 results, so the constant seems pretty stable. Having discovered this, I googled a little more and found an old comment on Hacker News from the user KMag:

At some point, someone from SEO found out that random () always returns 0.5. I’m not sure if anyone understood that JavaScript always saw a certain date from the summer of 2006, but I suppose that the situation has changed.

It seems that the situation persists for a long time, but it random()always returned before 0,5, and now it produces deterministic series of numbers. The date is really set at the beginning exactly, but then it can go to the future. KMag further said:

Hopefully now they set a random starting number and date using the cryptographic hash of all loaded scripts and page text, so that it will be deterministic, but it will be difficult to manipulate.

This does not seem to have happened. But I'm not sure that in this way you can do a lot of what you can not do with the user agent and IP. But maybe this way will allow you to do something, plausibly denying your guilt!

Also popular now: