D. Parikh, K. Skadron, Y. Zhang, and M. Stan.
In IEEE Transactions on Computers.
This paper uses Wattch and the SPEC 2000 integer and floating-point benchmarks to explore the role of branch predictor organization in power/energy/performance tradeoffs for processor design. Even though the direction predictor by itself represents less than 1% of the processorís total power dissipation, prediction accuracy is nevertheless a powerful lever on processor behavior and program execution time. A thorough study of branch predictor organizations shows that, as a general rule, to reduce overall energy consumption in the processor it is worthwhile to spend more power in the branch predictor if this results in more accurate predictions that improve running time. This not only improves performance, but can also improve the energy-delay product by up to 20%. Three techniques, however, can reduce power dissipation without harming accuracy. Banking reduces the portion of the branch predictor that is active at any one time. A new on-chip structure, the prediction probe detector (PPD), uses pre-decode bits to entirely eliminate unnecessary predictor and branch target buffer (BTB) accesses. Despite the extra power that must be spent accessing it, the PPD reduces local predictor power and energy dissipation by about 31%, and overall processor power and energy dissipation by 3%. These savings can be further improved by using profiling to annotate branches, identifying those that are highly biased and do not require static prediction. Finally, the paper explores the effectiveness of a previously-proposed technique, pipeline gating, and finds that even with adaptive control based on recent predictor accuracy, pipeline gating yields little or no energy savings.