
Bulletproof JavaScript tests
- Transfer
Writing JS speed tests is not as easy as it sounds. Without even touching on cross-browser compatibility issues, you can fall into many traps.
That is why I did jsPerf . A simple web interface so that everyone can create and share tests, and check the performance of various pieces of code. You don’t have to worry about anything - just enter the code whose performance you need to measure, and jsPerf will create a new testing task for you, which you can then run on different devices and in different browsers.
Behind the scenes, jsPerf first used the JSLitmus library , which I called Benchmark.js . Over time, she acquired new opportunities, and recently John-David Daltonrewrote everything from scratch.
This article sheds light on various tricky situations that can happen when developing JS tests.
There are several ways to run a test of a piece of JS code to test for performance. The most common option, template A :
The test code is placed in a loop that runs a given number of times (6). After that, the start date is subtracted from the end date. Such a template is used by the SlickSpeed , Taskspeed , SunSpider, and Kraken test frameworks .
With a constant increase in the performance of devices and browsers, tests that use a fixed number of repetitions increasingly give 0 ms as the result of work, which we do not need.
The second approach is to calculate how many operations are completed in a fixed time. Plus: no need to choose the number of iterations.
It executes the code for about a second, i.e. until totalTime exceeds 1000 ms.
Template B is used in the Dromaeo and V8 Benchmark Suite .
Due to garbage collection, engine optimizations, and other background processes, the execution time of the same code can change. Therefore, it is advisable to run the test many times and average the results. V8 Suite runs tests only once. Dromaeo - five times, but sometimes this is not enough. For example, reduce the minimum test execution time from 1000 to 50 ms so that more time is left for repeated runs.
JSLitmus combines two templates. It uses template A to run the test in the loop n times, but the loops adapt and increase n at runtime until the minimum runtime of the test is typed - i.e. as in pattern B.
JSLitmus avoids the problems of template A, but does not get away from the problems of template B. For calibration, the 3 fastest repetitions of the test are selected, which are subtracted from the results of the others. Unfortunately, the “best of three” is not statistically the best method. Even if you run the tests many times and subtract the calibration average from the average result, the increased error of the result will eat the whole calibration.
The problems of previous patterns can be eliminated through compilation of functions and unfolding loops.
But there are drawbacks. Compiling functions increases memory usage and slows down work. When you repeat the test several million times, you create a very long line and compile a gigantic function.
Another problem with the unfolding of the loop is that the test can organize the exit via return at the beginning of work. It makes no sense to compile a million lines if the function returns on the third line. You need to track these points and use template A in such cases.
Benchmark.js uses a different technology. We can say that it includes the best sides of all these patterns. We do not deploy cycles to save memory. To reduce the factors that affect accuracy and allow the tests to work with local methods and variables, we extract the function body for each test. For example:
After that, we run the extracted code in a while loop (pattern A), repeat until the specified time has passed (pattern B), repeat everything together so many times to get statistically significant results.
In some combinations of OS and browser, timers may not work correctly for various reasons . For example, when loading Windows XP, the interrupt time is usually 10-15 ms. That is, every 10 ms the OS receives an interrupt from the system timer. Some older browsers (IE, Firefox 2) rely on the OS timer, that is, for example, calling Date (). GetTime () receives data directly from the OS. And if the timer is updated only every 10-15 ms, this leads to the accumulation of measurement inaccuracies.
However, this can be circumvented. In JS, you can get the minimum unit of time . After that you need to calculatetest run time so that the error is no more than 1%. To get the error, you need to divide this minimum unit in half. For example, we use IE6 on Windows XP and the minimum unit is 15 ms. The error is 15 ms / 2 = 7.5 ms. In order for this error to be no more than 1% of the measurement time, we divide it by 0.01: 7.5 / 0.01 = 750 ms.
When launched with the --enable-benchmarking flag parameter, Chrome and Chromium give access to the chrome.Interval method, which allows you to use a high-resolution timer up to microseconds. When working on Benchmark.js, John-David Dalton met a nanosecond timer in Java , and made access to it from JS via a small java-applet .
Using a high-resolution timer, you can set less test time, which gives fewer errors as a result.
The launched Firebug add-on disables just-in-time compilation , so all tests are performed in the interpreter. They will work there much more slowly than usual. Remember to disable Firebug before tests.
The same, although to a lesser extent, applies to Web Inspector and Opera's Dragonfly. Close them before running tests so that they do not affect the results.
Tests that use loops are subject to various browser bugs - an example was demonstrated in IE9 with its function to remove dead code . Bugs in the Mozilla TraceMonkey engine or caching querySelectorAll results in Opera 11 can also prevent you from getting the right results. You need to keep them in mind.
An article by John Rezig describes why most tests fail to produce statistically significant results. In short, you should always evaluate the magnitude of the error of each result and reduce it in all possible ways.
Test scripts on real different versions of browsers. Do not rely, for example, on compatibility modes in IE. Also, IE until the 8th version limited the script to 5 million instructions. If your system is fast, then the script can execute them in half a second. In this case, you will receive a “Script Warning” message in the browser. Then you have to edit the number of allowed operations in the registry. Or use the program that fixes this restriction. Fortunately, it has already been removed in IE9.
Conclusion
Whether you are running several tests, writing your own test suite, or even a library - there are many hidden issues in the JS testing issue. Benchmark.js and jsPerf are updated weekly, fix bugs and add new features, increasing the accuracy of tests.
That is why I did jsPerf . A simple web interface so that everyone can create and share tests, and check the performance of various pieces of code. You don’t have to worry about anything - just enter the code whose performance you need to measure, and jsPerf will create a new testing task for you, which you can then run on different devices and in different browsers.
Behind the scenes, jsPerf first used the JSLitmus library , which I called Benchmark.js . Over time, she acquired new opportunities, and recently John-David Daltonrewrote everything from scratch.
This article sheds light on various tricky situations that can happen when developing JS tests.
Test patterns
There are several ways to run a test of a piece of JS code to test for performance. The most common option, template A :
var totalTime,
start = new Date,
iterations = 6;
while (iterations--) {
// Здесь идёт фрагмент кода
}
// totalTime → количество миллисекунд, потребовавшихся на шестикратное выполнение кода
totalTime = new Date - start;
The test code is placed in a loop that runs a given number of times (6). After that, the start date is subtracted from the end date. Such a template is used by the SlickSpeed , Taskspeed , SunSpider, and Kraken test frameworks .
Problems
With a constant increase in the performance of devices and browsers, tests that use a fixed number of repetitions increasingly give 0 ms as the result of work, which we do not need.
Pattern B
The second approach is to calculate how many operations are completed in a fixed time. Plus: no need to choose the number of iterations.
var hz,
period,
startTime = new Date,
runs = 0;
do {
// Здесь идёт фрагмент кода
runs++;
totalTime = new Date - startTime;
} while (totalTime < 1000);
// преобразуем ms в секунды
totalTime /= 1000;
// period → сколько времени занимает одна операция
period = totalTime / runs;
// hz → количество операций в секунду
hz = 1 / period;
// или можно записать короче
// hz = (runs * 1000) / totalTime;
It executes the code for about a second, i.e. until totalTime exceeds 1000 ms.
Template B is used in the Dromaeo and V8 Benchmark Suite .
Problems
Due to garbage collection, engine optimizations, and other background processes, the execution time of the same code can change. Therefore, it is advisable to run the test many times and average the results. V8 Suite runs tests only once. Dromaeo - five times, but sometimes this is not enough. For example, reduce the minimum test execution time from 1000 to 50 ms so that more time is left for repeated runs.
Pattern C
JSLitmus combines two templates. It uses template A to run the test in the loop n times, but the loops adapt and increase n at runtime until the minimum runtime of the test is typed - i.e. as in pattern B.
Problems
JSLitmus avoids the problems of template A, but does not get away from the problems of template B. For calibration, the 3 fastest repetitions of the test are selected, which are subtracted from the results of the others. Unfortunately, the “best of three” is not statistically the best method. Even if you run the tests many times and subtract the calibration average from the average result, the increased error of the result will eat the whole calibration.
Pattern D
The problems of previous patterns can be eliminated through compilation of functions and unfolding loops.
function test() {
x == y;
}
while (iterations--) {
test();
}
// …скомпилируется в →
var hz,
startTime = new Date;
x == y;
x == y;
x == y;
x == y;
x == y;
// …
hz = (runs * 1000) / (new Date - startTime);
Problems
But there are drawbacks. Compiling functions increases memory usage and slows down work. When you repeat the test several million times, you create a very long line and compile a gigantic function.
Another problem with the unfolding of the loop is that the test can organize the exit via return at the beginning of work. It makes no sense to compile a million lines if the function returns on the third line. You need to track these points and use template A in such cases.
Body function extraction
Benchmark.js uses a different technology. We can say that it includes the best sides of all these patterns. We do not deploy cycles to save memory. To reduce the factors that affect accuracy and allow the tests to work with local methods and variables, we extract the function body for each test. For example:
var x = 1,
y = '1';
function test() {
x == y;
}
while (iterations--) {
test();
}
// …скомпилируется в →
var x = 1,
y = '1';
while (iterations--) {
x == y;
}
After that, we run the extracted code in a while loop (pattern A), repeat until the specified time has passed (pattern B), repeat everything together so many times to get statistically significant results.
What you need to pay attention to
Not quite right timer work
In some combinations of OS and browser, timers may not work correctly for various reasons . For example, when loading Windows XP, the interrupt time is usually 10-15 ms. That is, every 10 ms the OS receives an interrupt from the system timer. Some older browsers (IE, Firefox 2) rely on the OS timer, that is, for example, calling Date (). GetTime () receives data directly from the OS. And if the timer is updated only every 10-15 ms, this leads to the accumulation of measurement inaccuracies.
However, this can be circumvented. In JS, you can get the minimum unit of time . After that you need to calculatetest run time so that the error is no more than 1%. To get the error, you need to divide this minimum unit in half. For example, we use IE6 on Windows XP and the minimum unit is 15 ms. The error is 15 ms / 2 = 7.5 ms. In order for this error to be no more than 1% of the measurement time, we divide it by 0.01: 7.5 / 0.01 = 750 ms.
Other timers
When launched with the --enable-benchmarking flag parameter, Chrome and Chromium give access to the chrome.Interval method, which allows you to use a high-resolution timer up to microseconds. When working on Benchmark.js, John-David Dalton met a nanosecond timer in Java , and made access to it from JS via a small java-applet .
Using a high-resolution timer, you can set less test time, which gives fewer errors as a result.
Firebug disables JIT in Firefox
The launched Firebug add-on disables just-in-time compilation , so all tests are performed in the interpreter. They will work there much more slowly than usual. Remember to disable Firebug before tests.
The same, although to a lesser extent, applies to Web Inspector and Opera's Dragonfly. Close them before running tests so that they do not affect the results.
Features and Browser Bugs
Tests that use loops are subject to various browser bugs - an example was demonstrated in IE9 with its function to remove dead code . Bugs in the Mozilla TraceMonkey engine or caching querySelectorAll results in Opera 11 can also prevent you from getting the right results. You need to keep them in mind.
Statistical significance
An article by John Rezig describes why most tests fail to produce statistically significant results. In short, you should always evaluate the magnitude of the error of each result and reduce it in all possible ways.
Cross browser testing
Test scripts on real different versions of browsers. Do not rely, for example, on compatibility modes in IE. Also, IE until the 8th version limited the script to 5 million instructions. If your system is fast, then the script can execute them in half a second. In this case, you will receive a “Script Warning” message in the browser. Then you have to edit the number of allowed operations in the registry. Or use the program that fixes this restriction. Fortunately, it has already been removed in IE9.
Conclusion
Whether you are running several tests, writing your own test suite, or even a library - there are many hidden issues in the JS testing issue. Benchmark.js and jsPerf are updated weekly, fix bugs and add new features, increasing the accuracy of tests.