The concept of machine balance has been defined in a number of studies (e.g.,) as a ratio of the number of memory operations per cpu cycle to the number of floating-point operations per cpu cycle for a particular processor.
This definition introduces a systematic bias into the results, because it does not take into account the true cost of memory accesses in most systems, for which cache miss penalties (and perhaps other forms of latency and contention) must be included. To attempt to overcome the systematic bias incurred by the use of this definition, the definition of machine balance used here defines the number of memory operations per cpu cycle based on the performance with long, uncached vector operands with unit stride.
peak floating ops/cycle balance = ------------------------- sustained memory ops/cycle
With this new definition, the ``balance'' can be interpreted as the number of FP operations that can be performed during the time for an ``average'' memory access. Note that ``average'' here indicates an average across the elements of a cache line for the long vector operations, and may not correspond to the average cost of a memory access for any particular application code.
Using the ``Peak MFLOPS'' results, one can than calculate the machine balance and plot it against the memory bandwidth, as shown in Fig. 7.
Figure 7: STREAM TRIAD MFLOPS and Machine Balance for a variety of recent and current computers. Each point represents one computer system, and connected points represent either multi-processor results from parallel systems, or different cpu speeds within the same cpu family for uniprocessor systems. The diagonal lines indicate Peak Performance of 0.1, 1.0, and 10.0 GFLOPS (from left to right).
The results in Fig. 7 show a remarkably clear distinction between the four memory categories: