Inconsistent Measurement Context


A measurement context is inconsistent when an experiment compares two systems and uses different measurement contexts for each system. The different contexts may produce incomparable results for the two systems. Unfortunately, the more disparate the objects of comparison, the more difficult it is to ensure consistent measurement contexts. Even a benign‐looking difference in contexts can introduce bias and make measurement results incomparable. For this reason, it may be important to randomize experimental parameters (e.g., memory layout in experiments that measure performance).

Example:
 You claim that the optimizations that you implemented in the most recent version of your multi‐threaded mobile phone application led to a 20% reduction in response time. Your evaluation compared the response time with optimizations on a dual‐core phone against the response time without optimizations on a single‐core phone.

Real-World Examples

  • Compare your 32-bit results with 64-bit results on other systems. We all know that it is hard to obtain impressive performance using 64-bit floating point arithmetic. Some research systems do not even have 64-bit hardware. Thus [...] compare your 32-bit results with 64-bit results on other systems.
    -- Point #1 in David H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers in Supercomputing Review, August 1991
  • Measure parallel run times on a dedicated system, but measure conventional run times in a busy environment. There are a number of ways to further boost the performance of your parallel code relative to the conventional code. One way is [...]. Another is to time your parallel computer code on a dedicated system and time your conventional code in a normal loaded environment. After all, your conventional supercomputer is very busy, and it is hard to arrange dedicated time. If anyone in the audience asks why the parallel system is freely available for dedicated runs, but the conventional system isn’t, change the subject.
    -- Point #11 in David H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers in Supercomputing Review, August 1991
  • When direct run time comparisons are required, compare with an old code on an obsolete system. Direct run time comparisons can be quite embarrassing, especially if your parallel code runs significantly slower than an implementation on a conventional system. If you are challenged to provide such figures, compare your results with the performance of an obsolete code running on obsolete hardware with an obsolete compiler. For example, you can state that your parallel performance is “100 times faster than a VAX 11/780”. A related technique is to compare your results with results on another less capable parallel system or minisupercomputer. Keep in mind the bumper sticker “We may be slow, but we’re ahead of you.”
    -- Point #7 in David H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers in Supercomputing Review, August 1991

...