Okay, now I have what Alex provided you.
(a) He was running w/o opt, because the optimizer optimizes away the code.
(b) His output and my output for : CM2 16K, N=80000, SAXPY, differ by a
factor of 100 in the printout. I *know* the program is wrong, so his
output is actually correct. The mystery is, how he gets the right
printout with the wrong program. Of course, he may have hand-corrected
the output. Or, the version he sent you is not the latest one.
(c) So for a 16K CM2 he gets 20148.3135 MB/s which *I* compute to be
5.622 bytes/PE/clock which is faster than the theoretical rate which *I*
think is 4 bytes/PE/clock.
(Those emphatic I's mean : feel free to have a different opinion).
(d) I still don't know
(1) what you posted, because you are posting mflops, not transfer rates ;
I compute 0.4685 flops/PE/clock or 1.679 GFlops.
(2) what the Cray folks were complaining about.
(The bottom line, though, is that I still think the timer is screwed.
But I want to get the history straight before I get back to Alex.)
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT