Evaluate Collaboratory Technical Report #1: Can you trust your experimental results?

Date

February 15, 2012

Authors

Stephen M. Blackburn, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney, José Nelson Amaral, Vlastimil Babka, Walter Binder, Tim Brecht, Lubomír Bulej, Lieven Eeckhout, Sebastian Fischmeister, Daniel Frampton, Robin Garner, Andy Georges, Laurie J. Hendren, Michael Hind, Antony L. Hosking, Richard Jones, Tomas Kalibera, Philippe Moret, Nathaniel Nystrom, Victor Pankratius, Petr Tuma

Abstract

Many contributions in computer science rely on quantitative experiments to validate their efficacy. Well-designed experiments provide useful insights while poorly-designed experiments can mislead. Unfortunately, experiments are difficult to design and even seasoned experimenters can make mistakes. This paper presents a framework that enables us to talk and reason about experimental evaluation. As such, we hope it will help our community to avoid and recognize mistakes in our experiments. This paper is the outcome of the Evaluate 2011 workshop whose goal was to improve experimental methodology in computer science.

Download

EvaluateCollaboratoryTR1.pdf