(no subject)

From: Robert.Bell@mel.dit.csiro.au
Date: Tue Oct 01 1991 - 00:12:08 CDT


 John,
      I tried your benchmark on a Cray Y-MP2/216, using just a single processor.
 The cycle time is 6.0 ns (near enough).
      I needed to make several modifications. I firstly deleted your second
 function, and used Cray's supplied intrinsic. I then executed the program,
 and obtained sensational but erroneous results. Because your program does not
 use the array c, although defining it several times, the compiler seemed to
 optimize out completely the do loops - not a bad feature. I inserted some
 code to force the storage of the arrays, and turned on a compiler feature
 to convert double precision features to real (64-bit of course on the Cray),
 and eventually received the following results.

 Timing calibration ; t = 0.6722976 clicks
     
 Assignment: Rate = 2379.898425935 MB/s MFLOPS = 0.
 Scaling: Rate = 2335.76778731 MB/s MFLOPS = 145.9854867069
 Summing: Rate = 3220.624881743 MB/s MFLOPS = 134.1927034059
 SAXPYing: Rate = 3264.136362561 MB/s MFLOPS = 272.0113635467

    The theoretical peaks are 2666 Mbyte/s for the first two, and
 4000 Mbyte/s for the second two tests. (2*8/6.0e-9 and 3*8/6.0e-9) / 1.0e6.

    I have been interested in memory transfer times for several years now.
 I have run similar tests to yours, but also looking at relative memory
 locations for the arrays, and scatter-gathers with various strides. On some
 machines, the relative locations of the arrays (whether they start in the same
 bank or not, etc) can have a substantial effect.
     Incidentally, I have some results from the Cray C-90 (=Y-MP16). The best
 result for a loop like your assignment loop is 6937 Mbyte /s.
 
     Here are some speeds for various machines, expressed in Mbyte/s. The first
 group is for a(i) = b(i+k-1), k = 1, 128, and a and b being forced to start
 in the same bank. The two figures for each machine represent the slowest and
 fastest times seen as k varied.
     The second group is for loops like a(i) = b(1 + k*(i-1)) for various values
 of k.
     I will be happy to provide more information - I would like to eventually
 publish the improved version of my benchmark code, which I am working on.

'a(i) = b(i+k-1) - offset$'
'C-90$', 6511., 6937. /
'Y-MP$', 2340., 2423. /
'X-MP$', 891., 1686. /
'205$', 1447., 1587. /
'C220$', 137., 150. /
'ETA 10P*$', 1400., 1506. /
'ETA 10P$', 619., 1305. /
'SG 280$', 17.7, 19.3 /
'$' /
'scatter-gather - stride$'
'C-90$', 533., 6936. /
'Y-MP$', 511., 2412. /
'X-MP$', 232., 1506. /
'205$', 185., 699. /
'C220$', 31., 162. /
'ETA 10P*$', 73., 565. /
'ETA 10P$', 25., 474. /
'SG 280$', 5.2, 16.4 /
'$' /

 Robert Bell.

/ Robert C. Bell | CSIRO Supercomputing Support Manager \
| Division of Information Technology | 'phone: (03) 282 2620 +61 3 282 2620 |
| 723 Swanston Street | fax: (03) 282 2600 +61 3 282 2600 |
\ Carlton VIC 3053 Australia | email: csrcb@mel.dit.csiro.au /



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:01 CDT