Microprocessor Reliability




Technology scaling, which has paved the way for multicore processors, also gives rise to a variety of silicon reliability problems. These problems include particle-induced soft errors and lifetime reliability phenomena, such as Negative Bias Temperature Instability (NBTI) and process variations. These phenomena threaten to break the abstraction that architecture has traditionally provided to the higher layers of the system as a reliable computing substrate. Existing techniques that combat these reliability problems entail significant performance and power overheads. It is imperative to reduce these overheads to effectively harness the computing power of future multicore processors. However, reducing these overheads without seriously compromising the required level of fault protection is challenging.

The goal of this project is to develop fault tolerance techniques that provide protection against various silicon reliability phenomena while imposing significantly less performance and power overheads than traditional reliability techniques. A few contributions of this project include Partial Redundant Multi-Threading mechanisms, runtime AVF prediction, Recovery Boosting to provide deep rejuvenation of SRAM cells from NBTI stresses, and combined circuit and microarchitectural techniques to mitigate NBTI on processor functional units.

Representative Publications