Re: CM5 Stream_d results

From: rsw@hydra.maths.unsw.EDU.AU
Date: Wed May 05 1993 - 17:50:19 CDT


John

The optimized results for the stream benchmark on a CM5 do look funny.
I changed the constant 3.0D0 in the SAXPY operation to DBLE(k),
where k is the loop index, and this inhibited whatever optimization
was going on, to produce the results below. I also noticed that the
unoptimized code could handle much larger problems (200,000 elements per VU)
while the optimized code ran out of memory well before then (an extra temporary)
Will get, and forward assembly listing of just the SAXPY loop (if you are
interested). Have you go any other info on how stream runs on a CM5?
For large enough vectors I would expect it to scale well to larger machines.

Rob

=============================================================================

 NO Optimization: cmf stream_d.fcm

 SAXPY with constant from loop index
          s = DBLE(k)
          CALL CM_timer_clear(4)
          CALL CM_timer_start(4)
          c = a + s * b
          CALL CM_timer_stop(4)
          t = CM_timer_read_elapsed(4)
          times(4,k) = t

 STREAM: Measure memory transfer rates in MB/s
 for simple computational kernels in Fortran

 CM5 with partition of 16 processors ( 64 vector units )

 --------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
 --------------------------------------

 Array length = 9600000
 Elements per VU = 150000
 Timing calibration: Time = 5.455848484848485 hundredths of a second
 Increase the size of the arrays if this is < 30
 and your clock precision is =< 1/100 second

 ---------------------------------------------------------
 Function : Rate (MB/s) RMS time Min time Max time
 Assignment: 4887.33331 0.03143 0.03143 0.03145
 Scaling : 4893.41038 0.03153 0.03139 0.03276
 Summing : 5145.05312 0.04500 0.04478 0.04615
 SAXPYing : 5143.42075 0.04501 0.04480 0.04616

 =======================================================================

 Optimization ON: cmf -O stream_d.fcm

 STREAM: Measure memory transfer rates in MB/s
 for simple computational kernels in Fortran

 CM5 with partition of 16 processors ( 64 vector units )

 --------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
 --------------------------------------

 Array length = 9600000
 Elements per VU = 150000
 Timing calibration: Time = 5.468930303030303 hundredths of a second
 Increase the size of the arrays if this is < 30
 and your clock precision is =< 1/100 second

 ---------------------------------------------------------
 Function : Rate (MB/s) RMS time Min time Max time
 Assignment: 4894.00569 0.03139 0.03139 0.03142
 Scaling : 4898.13904 0.03144 0.03136 0.03273
 Summing : 7328.59745 0.03158 0.03144 0.03280
 SAXPYing : 5143.05891 0.04494 0.04480 0.04615



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:03 CDT