- MIPS Android
Archive for category Benchmarks
By Eyal Barzilay, Applications and Benchmarking Manager
MIPS’ MT technology boosts performance 43 percent; combined MT & MC boosts performance 150 percent
The use of multi-core technology to deliver more CPU horsepower is one of the increasingly common methods to providing higher system performance in hardware. This is true even for high volume consumer applications where cost and power can be very important. However, upgrading to a multi-core system doesn’t automatically guarantee performance improvements or an enhanced user experience. It’s not just a hardware problem – software must be written in a way that can make use of parallel hardware resources. But software is adapting – systems are getting much more complex, such that multiple processes and threads are running simultaneously in many cases, and applications are being written to take better advantage of multiprocessing hardware trends.
With that in mind, we recently used the BrowsingBench™ benchmark from EEMBC to evaluate the performance benefits of MIPS’ multi-core (MC) and multi-threading (MT) technologies. Our objective was to find out how these technologies enhance the user experience of a very popular and very real consumer application – web browsing on the Android™ software platform.
BrowsingBench is a credible and widely used tool that is trusted and cited by leading technology companies. It measures web page loading and rendering time for a large set of web pages with diverse content, and it does this in a reliable way which leads to repeatable and meaningful results. It will run on any connected device with a web browser. And rather than performing a synthetic test, BrowsingBench performs the same operations which a human would perform on the device. We’ve used several other benchmarks in the past that were suitable for evaluating MC/MT system performance; however none represented the real-world user experience on connected devices as well as BrowsingBench.
We ran BrowsingBench on a system based on the MIPS32® 1004K™ Coherent Processing System (CPS). In its maximum implementation, the 1004K CPS can support up to four cores and two hardware threads (also known as Virtual Processing Elements or VPEs) per core. To keep things simple for this test however, the configuration we used was dual core with two VPEs per core, for a total of 4 VPEs. VPEs are essentially logical CPUs that share one physical pipeline in each 1004K core, based on MIPS’ multi-threading technology.
To evaluate the benefits of multiple cores and VPEs on web browser performance, we ran BrowsingBench using the 4 different configurations listed in the table below. In all cases, the tests were executed on the same dual-core 1004K system; however we used the operating system to enable and disable cores and VPEs.
The big question we wanted to answer was whether Android would be able to take advantage of these multiple processing resources to load and render web pages faster, thereby enhancing the user experience. To do that, Android would have to use parallel processes and threads while executing the browsing workload.
The results, which are shown in the table and chart below, leave no doubt: Android based web browsing performance is greatly enhanced by MC and MT technologies.
The main observation is that browsing performance improves more than 2.5x when comparing the full configuration to the basic configuration. With a great deal of parallel execution under Android, the browser can truly benefit from the combination of MT and MC. A closer look at what’s happening under the hood in the Android system indeed shows that a lot of processes are running in parallel. The two main processes in the system are the Android Browser itself and another process called “system server,” which manages many components of Android including the display system, and is kept very busy during the BrowsingBench run.
Even if we limit the system to a single core, the MIPS MT technology gives us a BrowsingBench performance boost of 43 percent. One of the primary attributes of MT is to improve performance efficiency of a core, which it does by increasing the pipeline utilization of that core when multiple processes and/or threads are running. So for systems where silicon real estate is at a premium, choosing a multithreaded core can be a great way to boost system performance.
When multi-core and multi-threading systems were first introduced into the marketplace, most existing software was not optimized to make good use of these technologies. Today this is changing. Android is a complex software platform, and a perfect example of a high volume consumer platform that is quickly evolving and being optimized for optimal user experience in a Web-connected world.
At MIPS we are very pleased with the benchmark results because they demonstrate that our MC and MT technologies deliver much higher performance than the standard hardware used only a few years ago, and make a significant impact to end users of many connected devices from smart phones and tablets to connected DTVs.
MG Sieler at Techcrunch wrote the following article about Google’s efforts on performance benchmarking
In a post yesterday on their Chromium, it’s pretty clear that Google feels their V8 benchmark suite is the best. In fact, they directly call out their rivals’ suites, noting bugs and saying that they must evolve. And then they go one step further: providing links to versions of the rivals’ suites supposedly perfected by Google!
Wow. In the extremely nerdy (and fairly incestuous) browser world, this is hardcore.
Specifically, Google says that SunSpider, first developed by Apple in 2007, contains tests that are “less relevant to the web in 2011″. Here’s the best part:
To fix this issue, Google made its own modified version of SunSpider which essentially runs tests 50 times consecutively to better gauge speed. When the tests are run this way, Google says that “the results begin to reflect Chrome’s true performance.” Naturally. According to Google, Chrome is more than 30 percent faster (in the test results) when measured this way.
Meanwhile, they say that Kraken, the new benchmark suite Mozilla just created, is “in better shape” — but buggy. “As a result, the benchmark is less useful and has even (mis)led us to spend time making some irrelevant optimizations in Chrome,” they note.
To get around this, Google is now hosting a new version of Kraken “built directly from Mozilla’s source code repository”.
Are these claims about rivals’ suites legit? It’s hard to say for sure, but I have a feeling that the rivals themselves would dispute that. It is a bit odd that Google is reworking the suites, and that the end result is Chrome performing much better in the tests.
Of course, my eyes don’t lie. I’m a Chrome guy all the way because in daily usage I find it to be much faster than either Safari or Firefox (on a Mac, at least). Until that changes, I’m trusting Google on this one.
The issue with duration of particular test runs is important, and IMHO should be calibrated depending on the speed of the HW on which the tests are run.
Testing MultiBench on MIPS systems should take into account two features:
- To run each tests long enough, so that the time measurements are precise.
- To avoid running tests too long, otherwise the whole benchmark will be painfully slow.
In addition, when varying parameters of the system (the number of cores, cache sizes, clock speed, memory configuration, etc) one would wish to satisfy the “feature 1″ (i.e. running long enough) on the seemingly fastest configuration. And the number of iterations (repetitions of the tests?) should better be kept identical for the slate of the benchmark runs in given experiment.
Therefore, the best solution would be to run the “calibration execution” on the fastest configuration and extract the calibrated iterations into, say, “iterations.txt” file … which should be used later for the benchmarks.
Coming back to the MultiBench results: When leaving the default number of iterations for the tests, one can get 3-5% worse results comparing to the results obtained with better tuned number of iterations.
Recently we were benchmarking a platform and got unusually low scores. This platform was running benchmarks under Linux. After some investigation we found that the platform ran out of memory and as would be expected from a well behaved operating system it was swapping to the hard drive. By running the Linux command “top” we were able to see immediately that the system was out of memory. By freeing up some memory space we solved the benchmark score problem.