Inappropriate Metrics

A metric is inappropriate when it is flawed or when it does not support the intended claims of the experiment. A common manifestation of an inappropriate metric is the use of a surrogate metric, which is easy to measure, but which may not correlate with the desired metric it replaces.

Example: You claim that your new compiler optimization speeds up programs. Your evaluation used instructions per cycle (IPC) to measure performance with and without your optimization. Because your optimization may change the number of instructions, IPC is inappropriate. Your optimization might increase IPC by inserting extra no‐op instructions that execute quickly and increase IPC without improving overall performance.

Real-World Examples

If MFLOPS rates must be quoted, base the operation count on the parallel implementation, not on the best sequential implementation. We know that MFLOPS rates of a parallel codes are often not very impressive. Fortunately, there are some tricks that can make these figures more respectable. The most effective scheme is to compute the operation count based on an inflated parallel implementation. Parallel implementations often perform far more floating point operations than the best sequential implementation. Often millions of operations are masked out or merely repeated in each processor. Millions more can be included simply by inserting a few dummy loops that do nothing. Including these operations in the count will greatly increase the resulting MFLOPS rate and make your code look like a real winner.
-- Point #8 in David H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers in Supercomputing Review, August 1991
Quote performance in terms of processor utilization, parallel speedups or MFLOPS per dollar. As mentioned above, run time or even MFLOPS comparisons of codes on parallel systems with equivalent codes on conventional supercomputers are often not favorable. Thus whenever possible, use other performance measures. One of the best is “processor utilization” figures. It sounds great when you can claim that all processors are busy nearly 100% of the time, even if what they are actually busy with is synchronization and communication overhead. Another useful statistic is “parallel speedup” --- you can claim “fully linear” speedup simply by making sure that the single processor version runs sufficiently slowly. For example, make sure that the single processor version includes synchronization and communication overhead, even though this code is not necessary when running on only one processor. A third statistic that many in the field have found useful is “MFLOPS per dollar”. Be sure not to use “sustained MFLOPS per dollar”, i.e. actual delivered computational throughput per dollar, since these figures are often not favorable to new computer systems.
-- Point #9 in David H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers in Supercomputing Review, August 1991

...

Experimental Evaluation of Software and Systems in Computer Science

Search

Navigation

User login

Real-World Examples

The Canon

Letter to PC Chairs