J. W. Haskins and K. Skadron
Tech Report CS-2002-19, Univ. of Virginia Dept. of Computer Science, July, 2002.
This paper explores techniques for speeding up sampled microprocessor simulations by exploiting the observation that of the memory references that precede a sample, references that occur nearest to the sample are more likely to be germane during the sample itself. This means that accurately warming up simulated cache and branch predictor state only requires that a subset of the memory references and control-flow instructions immediately preceding a simulation sample need to be modeled. Our technique takes measurements of the memory reference reuse latencies (MRRLs) and uses these data to choose a point prior to each sample to engage cache hierarchy and branch predictor modeling. By starting cache and branch predictor modeling late in the pre-sample instruction stream, rather than modeling cache and branch predictor interactions for all pre-sample instructions we are able to save the time cost of modeling them. This savings reduces overall simulation running times by an average of 25%, while generating an average error in IPC of less than 0.7%.