Benchmarks

Tichy's "Should Computer Scientists Experiment More" defines benchmark as follows:

A benchmark is a task domain sample executed by a computer or by a human and computer. During execution, the human or computer records well-defined performance measurements.

The benchmarks (or, more generally, workloads) represent an important component of an experiment (see Evaluation Anti-Patterns). Workloads can be inappropriate, ignored, inconsistent, and irreproducible.

Prior work on benchmarks

Blackburn et al. 2006 The DaCapo benchmarks: java benchmarking development and analysis
Tempero et al. 2010 Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies
Tichy 1998 Should Computer Scientists Experiment More?
- "an effective way to simplify repeated experiments is by benchmarking"
- "the most subjective and therefore weakest part of a benchmark test is the benchmark's composition"
- "constructing a benchmark is usually intensive work"
- "it is necessary to evolve benchmarks to prevent overfitting"
- "benchmarks cause an area to blossom suddenly because they make it easy to identify promising approaches and to discard poor ones"

Check out all papers in the bibliography classified under "benchmarks".

Experimental Evaluation of Software and Systems in Computer Science

Search

Navigation

User login

Benchmarks

Prior work on benchmarks

The Canon

Letter to PC Chairs