Just how much trust can you put in benchmarks? Is Samsung tricking us?
Benchmarks are important. The quoted figures for any piece of hardware are all well and good, but potential buyers need to know how a hard drive, processors, computer, tablet or smartphone really performs. After all, two processors with a clock speed of 3GHz do not necessarily perform equally well, and it is only through testing that it is possible to determine which one comes out on top. Few people have the means to go out and compare two similar pieces of hardware, so this is where benchmarks prove useful.
People use smartphones and tablets for different things. One person might be happy being able to take notes and make phone calls, while someone else might be looking for a 60fps hi-def gaming experience. Here benchmarks matter. It is important to be able to accurately compare devices using reliable figures. If you want to know how quickly phone A shifts pixels around the screen compared to phone B, it is important that the tests are performed in the same way, and are carried out fairly.
This might seem like stating the obvious. It makes sense that benchmarks are only really meaningful if direct comparisons can be drawn between two sets of results. We need to know that each device has been tested in the same way.
Back in July, many people felt suspicious about the benchmark performance of Samsung's Galaxy S4.
It looked very much like code had been written in such a way that performance was increased when benchmarking software was running. These weren't just paranoid or unfounded claims from Samsung detractors; the company went as far as explaining the anomaly by saying that the GPU frequency of the S4 was increased from 480MHz to 533MHz when "running apps that are usually used in full-screen mode, such... certain benchmarking apps". Samsung insisted that the changes were "not intended to improve certain benchmark results".
This may be the case, but it does render benchmarks all but meaningless. The point of a benchmark should be to try to replicate real world scenarios so users can know what to expect in terms of performance -- knowing how well hardware can run benchmarking software is, in itself, useless information.
But it looks as though this may not have been a one-off. And it is Samsung that is in line for questioning again. This time around it is the Samsung Galaxy Note 3 which has been put through its paces, by Ars Technica.
Both devices feature the 2.3GHz Snapdragon 800 but Ars Technica's benchmarks found that the Note 3 scored higher -- and not just a bit higher... up to 20 percent higher -- than the G2. Ars Technica did a little digging and seems to have discovered that when certain benchmarking tools are running, Note 3 performance is increased. Perhaps more importantly, the team found a way to disable the alleged optimization so it was possible to see what difference it made (this essentially involved running a renamed version of a benchmarking tool to trick the Note 3 and prevent it from "realizing" benchmarking software was running. This is how the apparent 20 percent difference was found.
You might well wonder what the fuss is about. Surely it makes sense to have a device running at full pelt whenever possible. To a certain extent that is true, but it is not representative of real usage. All apps should be treated equally. Sure, games may make use of different CPU modes than, say, a simple email app, but there are few people who would want a benchmarking app to be treated differently to any other app.
It is understandable that a phone or tablet manufacturer would not want their device to be running flat out all of the time -- it would lead to massive battery usage and heat generation -- but there is a big difference between power conservation and fiddling the figures.
Consumers need to be able to put their trust in benchmarks. If performance is adjusted specifically when testing software is running, it means that there is not a level playing field and makes it impossible to accurately compare devices. As in so many walks of life, it is transparency that is needed. If figures are being manipulated in any way it does not matter as long as we know about it. If we know about it we can make further adjustments for ourselves so we can compare in a meaningful manner.
How do you feel about this? How much attention do you pay to benchmark results? Do they sway your buying decisions?