M. Co, D. A.B. Weikle, and K. Skadron.
In ACM Transactions on Architecture and Code Optimization, Vol. 3, No. 4, pp. 450-476.
Much research has focused on improving fetch bandwidth. Storing concatenated basic blocks to form instruction traces can significantly improve fetch performance. This work evaluates whether this storage method translates to significant energy-efficiency gains.
When considering access delay and area, trace caches achieve similar performance and energy-efficiency compared to instruction caches. We find that poorer implicit branch prediction from the NTP at smaller areas limits trace cache performance.
Our proposed novel ahead-pipelined NTP addresses access delay concerns. Results show that an STC fetch organization with a 3-stage, ahead-pipelined NTP can achieve 5-17% IPC and 29% ED2 improvements over conventional, unpipelined organizations.