> Thanks for the clarification. i was deliberately ignoring the I/O port
> since it is not used in the floating-point kernels that I was discussing
> and since I don't understand the details well enough to talk about it!
Has to be there to sell the machine, though. I brought up the point because
your analysis was not showing the full memory bandwidth potential of CRI's
A further point to consider is that the "memory bandwidth" of a Cray-class
machine must really be specified as three distinct values:
* Maximum processor bandwidth (CPUS * PORTS/CPU * BANDWIDTH/PORT)
* Maximum bank bandwidth (BANKS * BANDWIDTH/BANK)
* Maximum arbitration mechanism bandwidth (SECTIONS * BANDWIDTH/SECTION)
In practice, the minimum of these three values constrains the performance
you'll see on a real code. As a rule of thumb, total system bandwidth in
references/second should not be less than the processing rate of
floating-point results/second; this was the embarrassing imbalance of the
early CRAY-2 machines.
This view of memory system bandwidth shows why codes can get into trouble
with memory strides divisible by powers of two as small as 4 and 8. Such
strides cut down the number of sections/quadrants/octants being used, limiting
the arbitration mechanism bandwidth.
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:01 CDT