A sampling of single-processor bandwidth results for vector machines is presented in Figure 1.
Figure 1: Single-cpu sustainable memory bandwidth for some vector processors. The grey bar indicates bandwidth for the ``copy'' operation, while the black bar indicates bandwidth for the ``triad'' operation''.
The range of observed values is quite large, but it must be noted that the age and price range is also quite large. A clear distinction is evident between those machines with one or two load/store ports per cpu and those with three or four. In the former case, the ``copy'' and ``triad'' operations run at similar speeds, with a slight edge to the ``copy'' operation (since there is no additional latency cost to fill the floating-point pipelines). In the latter case, there is a significant increase in performance for the three-operand ``triad'' operation. The ratio in bandwidth is always less than 3:2 because of the additional latency cost of filling the floating-point arithmetic pipeline(s).
Most of the vector machines listed are also shared memory machines. Parallel scaling for the Cray Research line of Parallel Vector Processors is presented in Figure 2.
Figure 2: Parallel scaling of sustainable memory bandwidth on the Cray Research machines. As in the previous figure, the grey bar indicates bandwidth for the ``copy'' operation, while the black bar indicates bandwidth for the ``triad'' operation''.
All of the machines show near perfect scaling for the ``copy'' operation, but the ``triad'' operation is clearly hitting bottlenecks in the 16-cpu Cray C90 results.