MG Sieler at Techcrunch wrote the following article about Google’s efforts on performance benchmarking
In a post yesterday on their Chromium, it’s pretty clear that Google feels their V8 benchmark suite is the best. In fact, they directly call out their rivals’ suites, noting bugs and saying that they must evolve. And then they go one step further: providing links to versions of the rivals’ suites supposedly perfected by Google!
Wow. In the extremely nerdy (and fairly incestuous) browser world, this is hardcore.
Specifically, Google says that SunSpider, first developed by Apple in 2007, contains tests that are “less relevant to the web in 2011″. Here’s the best part:
To fix this issue, Google made its own modified version of SunSpider which essentially runs tests 50 times consecutively to better gauge speed. When the tests are run this way, Google says that “the results begin to reflect Chrome’s true performance.” Naturally. According to Google, Chrome is more than 30 percent faster (in the test results) when measured this way.
Meanwhile, they say that Kraken, the new benchmark suite Mozilla just created, is “in better shape” — but buggy. “As a result, the benchmark is less useful and has even (mis)led us to spend time making some irrelevant optimizations in Chrome,” they note.
To get around this, Google is now hosting a new version of Kraken “built directly from Mozilla’s source code repository”.
Are these claims about rivals’ suites legit? It’s hard to say for sure, but I have a feeling that the rivals themselves would dispute that. It is a bit odd that Google is reworking the suites, and that the end result is Chrome performing much better in the tests.
Of course, my eyes don’t lie. I’m a Chrome guy all the way because in daily usage I find it to be much faster than either Safari or Firefox (on a Mac, at least). Until that changes, I’m trusting Google on this one.
The issue with duration of particular test runs is important, and IMHO should be calibrated depending on the speed of the HW on which the tests are run.
Testing MultiBench on MIPS systems should take into account two features:
- To run each tests long enough, so that the time measurements are precise.
- To avoid running tests too long, otherwise the whole benchmark will be painfully slow.
In addition, when varying parameters of the system (the number of cores, cache sizes, clock speed, memory configuration, etc) one would wish to satisfy the “feature 1″ (i.e. running long enough) on the seemingly fastest configuration. And the number of iterations (repetitions of the tests?) should better be kept identical for the slate of the benchmark runs in given experiment.
Therefore, the best solution would be to run the “calibration execution” on the fastest configuration and extract the calibrated iterations into, say, “iterations.txt” file … which should be used later for the benchmarks.
Coming back to the MultiBench results: When leaving the default number of iterations for the tests, one can get 3-5% worse results comparing to the results obtained with better tuned number of iterations.