Evaluate Collaboratory Technical Report #1: Can you trust your experimental results?


February 15, 2012


Stephen M. Blackburn, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney, José Nelson Amaral, Vlastimil Babka, Walter Binder, Tim Brecht, Lubomír Bulej, Lieven Eeckhout, Sebastian Fischmeister, Daniel Frampton, Robin Garner, Andy Georges, Laurie J. Hendren, Michael Hind, Antony L. Hosking, Richard Jones, Tomas Kalibera, Philippe Moret, Nathaniel Nystrom, Victor Pankratius, Petr Tuma


Many contributions in computer science rely on quantitative experiments to validate their efficacy. Well-designed experiments provide useful insights while poorly-designed experiments can mislead. Unfortunately, experiments are difficult to design and even seasoned experimenters can make mistakes. This paper presents a framework that enables us to talk and reason about experimental evaluation. As such, we hope it will help our community to avoid and recognize mistakes in our experiments. This paper is the outcome of the Evaluate 2011 workshop whose goal was to improve experimental methodology in computer science.