Re: Cray Results for bandwidth test

From: Andrew Zachary (alz@grumpy.cray.com)
Date: Tue Oct 01 1991 - 18:47:06 CDT


John,

Even with your new program, I had to put each of your loops into
a separate subroutine to fool the compiler. When I did that, I
ran the resulting code with each array = 2Mwords. I ran on
sn1061, a Y-MP8/8128 with 6 nsec clock, 256 banks. With 1 cpu,
results are

--------------------------------------
 Single precision appears to have 14 digits of accuracy
 Assuming 8 bytes per default REAL word
--------------------------------------
 Timing calibration ; time = 3.9378048 hundredths of a second
 Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second
 ---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 2436.8420 0.0131 0.0131 0.0131
Scaling : 2436.9222 0.0131 0.0131 0.0131
Summing : 3643.9862 0.0137 0.0132 0.0140
SAXPYing : 3517.9324 0.0138 0.0136 0.0140

With 8 cpu's, the results are (cf77 -Zp)

--------------------------------------
 Single precision appears to have 14 digits of accuracy
 Assuming 8 bytes per default REAL word
--------------------------------------
 Timing calibration ; time = 0.7495896003093 hundredths of a second
 Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second
 ---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment:19291.5887 0.0019 0.0017 0.0029
Scaling :19294.1710 0.0019 0.0017 0.0028
Summing :26588.9383 0.0020 0.0018 0.0026
SAXPYing :26802.1963 0.0021 0.0018 0.0028

Again, I am a little suprised that the I don't get 3 full words of I/O
per clock-cycle (read that as memory access). I seem to get about 2.7
words/clock-period. I will retry it using the BLAS level 1 routines,
and see what I get.

Hope these results help you.
Andrew Zachary
alz@grumpy.cray.com



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:01 CDT