Advisor: Sudhanva Gurumurthi
Attending Faculty: Mary Lou Soffa, Chair
OLSSON 236D, 15:00:00
A Master's Project Presentation
ABSTRACT
Transient faults can lead to serious errors in execution. Providing protection for the processor core against these faults requires redundant execution, which leads to a performance loss. However, not all bit flips have equal impact on the processor. The Architectural Vulnerability Factor (AVF) quantifes when a soft error is likely to alter the final output and when it has little impact due to the effects of masking. Thus, redundancy is only important during periods of high AVF. Although calculating the AVF typically requires post-execution analysis of the microarchitectural behavior of a program, recent work has shown it can be estimated online. However, redundant execution changes the bits that flow through the processor, exposing bottlenecks that single-threaded execution may not display and slowing overall execution to an unpredictable degree. This variability complicates estimation of the single-threaded AVF during redundant execution, making it difficult to decide when protection is unnecessary due to low vulnerability.
To leverage these low AVF periods without leaving the processor vulnerable to transient faults, we need a way to track the single-threaded AVF even when protection is enabled. Our solution is to investigate the predictability of the single-threaded AVF during redundant execution and develop predictors for the underlying AVF of three processor structures. We then evaluate these predictors in a partial RMT implementation using intelligent toggling with a sample reliability policy.