In quantitative empirical studies, metrics are necessary to quantify certain properties of systems. Specifically, metrics identify the properties that the experiment will measure and how those properties will be measured. Metrics can range from the end‐to‐end execution time of a system, to the size of a data structure, to the precision of static analysis results.
A metric can be inappropriate, ignored, inconsistent, or irreproducible:
Inappropriate Metrics
A metric is inappropriate when it is flawed or when it does not support the intended claims of the experiment.
Ignored Metrics
A metric is ignored when it is excluded despite being necessary to confirm an evaluation's claims.
Inconsistent Metrics
To demonstrate the superiority of an innovation, many evaluations compare that innovation to an existing baseline. If one metric is used for the baseline, and a (even slightly) different metric is used for the innovation, the metrics are inconsistent and the measurement results are not necessarily comparable.
Irreproducible Metrics
A metric's name may seem to unambiguously define the meaning of that metric, however, often the actual implementation of the metric leaves a lot of flexibility. A study needs to precisely define how the metric is measured, otherwise future studies cannot produce comparable measurements.