Inappropriate Data Analysis


A common goal of data analysis is to draw conclusions about a population from a small subset of the population (a sample). For this generalization to be valid, the experiment must use appropriate analysis methods.

Example: You claim that your sales‐promotion feature does not lead to perceptibly longer response times. Your evaluation took a sample of 50 measurements and computed the minimum response time. The minimum is not an appropriate statistic because the minimum may be a low‐probability outlier.

Real-World Examples

  • Scale up the problem size with the number of processors, but omit any mention of this fact. Graphs of performance rates versus the number of processors have a nasty habit of trailing off. This problem can easily be remedied by plotting the performance rates for problems whose sizes scale up with the number of processors. The important point is to omit any mention of this scaling in your plots and tables. Clearly disclosing this fact might raise questions about the efficiency of your implementation.
    -- Point #4 in David H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers in Supercomputing Review, August 1991
  • Quote performance results projected to a full system. Few labs can afford a full-scale parallel computer --- such systems cost millions of dollars. Unfortunately, the performance of a code on a scaled down system is often not very impressive. There is a straightforward solution to this dilemma --- project your performance results linearly to a full system, and quote the projected results, without justifying the linear scaling. Be very careful not to mention this projection, however, since it could seriously undermine your performance claims for the audience to realize that you did not actually obtain your results on real full-scale hardware.
    -- Point #5 in David H. Bailey. Twelve ways to fool the masses when giving performance results on parallel computers in Supercomputing Review, August 1991

...