stream benchmark

From: jbs@watson.ibm.com
Date: Fri Oct 04 1991 - 18:54:10 CDT


         I saw your posts about your stream benchmark. I obtained a copy
and have been playing around with it. I ran it on a 540 (256M memory,
xlf 2.2, option -O, n=3000000). I enclose the output.
         I saw your post in which you try to compute a theoretical rate
for the 550. I believe you have a slightly inaccurate picture of how the
S/6000 cache works. When a cache miss occurs on a store operation the
appropriate line is read into the cache from main memory and the store is
performed. When a line is brought into cache it may be necessary to put
the line it replaces back in memory. The machine keeps track of which
cache lines have been altered. If a cache line has not been altered
it may just be overwritten. However if the cache miss causes a line
that has been altered to be replaced the altered line must be stored
back to memory (since the stores to the altered line have not yet been
reflected to memory). In practice buffers are used so the incoming
cache line is read in before the outgoing line is read out. Therefore
each of your tests involves one additional read into cache. You may
check this by altering them to overwrite one of the inputs (the alter-
ed tests should perform better).
                          James B. Shearer
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time = 227.000000000000057 hundredths of a second
 Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second
 ---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 145.4545 .3340 .3300 .3400
Scaling : 129.7297 .3740 .3700 .3800
Summing : 144.0000 .5080 .5000 .5100
SAXPYing : 144.0000 .5090 .5000 .5100



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:01 CDT