To demonstrate the superiority of an innovation, many evaluations compare that innovation to an existing baseline. If one metric is used for the baseline, and a (even slightly) different metric is used for the innovation, the metrics are inconsistent and the measurement results are not necessarily comparable.
Example: You claim that your optimization reduces L2 cache misses. Your evaluation of the unoptimized program includes both demand‐load and prefetch request misses, but the optimized version only counts demand‐load misses.
...