Stream results for IBM RS/6k 595

From: stefand@ferrari.lcam.u-psud.fr
Date: Fri Feb 05 1999 - 12:28:22 CST


Hi John,

 I just tried to get stream data for our IBM RS/6k 43-260 Power3 machine
but that didn't work very well: So far only the xlf 4.x compiler is
installed, and it doesn't know yet about the Power3 machines. Hence, the
numbers are not representative (in fact, smaller than the ones you list
in the experimental results section).

I am planning to have the compiler upgraded and then I will make another
attempt. However, once on it I decided to benchmark our IBM RS/6k 595
(AIX 4.1, 2 GB RAM) and found the so far fastest results for this
machine with these compiler options:

xlf -O3 -qnozerosize -qarch=pwr2 -qtune=pwr2 stream_d.f second_cpu.f \
    -bmaxdata:500000000 -qhot -qhot=arraypad -qalign=4k -o strem_595

to be in the following price range

Function Rate (MB/s) RMS time Min time Max time
Copy: 847.0588 .4002 .3400 .4500
Scale: 900.0000 .3965 .3200 .4400
Add: 881.6327 .5402 .4900 .6600
Triad: 960.0000 .5273 .4500 .6000

(The Copy result is from a re-run of the same binary, well within the 5%
threshold you also mentioned). The complete log is appended for your
records. This is faster than the currently listed 595 entry (probably
still with xlf 3.x?) -- maybe you could also list the dates of the
entries so one can deduce such things? -- and also faster than I can
get the Power3 machine with this compiler.

You will notice that I had to increase n quite a bit since the box was
too fast. For fun I'll build an OS/2 version tonight for pentium and p6
class machines.

                        Cheers, Stefan

Anyway, here is the complete log:

Concord: (IBM 595 2 GB RAM) xlf 4.x
xlf -O3 -qnozerosize -qarch=pwr2 -qtune=pwr2
    stream_d.f second_cpu.f -bmaxdata:500000000 -qhot
    -o strem_595

timex ./strem_595
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 18000000
 Offset = 0
 The total memory requirement is 411 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10000 microseconds
 The tests below will each take a time on the order
 of 220000 microseconds
    (= 22 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 738.4615 .4071 .3900 .4200
Scale: 738.4615 .3981 .3900 .4100
Add: 830.7692 .5381 .5200 .5600
Triad: 847.0588 .5333 .5100 .5600
 Sum of a is = 0.207594140609274962E+20
 Sum of b is = 0.415188281269478605E+19
 Sum of c is = 0.553584374975187046E+19

real 46.64
user 20.25
sys 2.65

xlf -O3 -qnozerosize -qarch=pwr2 -qtune=pwr2 stream_d.f second_cpu.f \
    -bmaxdata:500000000 -qhot -qhot=arraypad -o strem_595
concord pts/4 [169] /lscrtch/sad/stream> timex ./strem_595
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 18000000
 Offset = 0
 The total memory requirement is 411 MB
 You are running each test 10 times
 The *best* time for each test is used
 --------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 757.8947 .3991 .3800 .4100
Scale: 757.8947 .4002 .3800 .4300
Add: 864.0000 .5382 .5000 .5600
Triad: 864.0000 .5212 .5000 .5400
 Sum of a is = 0.207594140609274962E+20
 Sum of b is = 0.415188281269478605E+19
 Sum of c is = 0.553584374975187046E+19

real 46.46
user 19.93
sys 2.89

xlf -O3 -qnozerosize -qarch=pwr2 -qtune=pwr2 stream_d.f second_cpu.f \
    -bmaxdata:500000000 -qhot -qhot=arraypad -qalign=4k -o strem_595

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 18000000
 Offset = 0
 The total memory requirement is 411 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10000 microseconds
 The tests below will each take a time on the order
 of 220000 microseconds
    (= 22 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 738.4615 .4092 .3900 .4300
Scale: 757.8947 .3991 .3800 .4100
Add: 864.0000 .5262 .5000 .5500
Triad: 881.6327 .5253 .4900 .5500
 Sum of a is = 0.207594140609274962E+20
 Sum of b is = 0.415188281269478605E+19
 Sum of c is = 0.553584374975187046E+19

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 18000000
 Offset = 0
 The total memory requirement is 411 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10000 microseconds
 The tests below will each take a time on the order
 of 220000 microseconds
    (= 22 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 757.8947 .3971 .3800 .4100
Scale: 757.8947 .3911 .3800 .4000
Add: 847.0588 .5372 .5100 .5500
Triad: 847.0588 .5241 .5100 .5500
 Sum of a is = 0.207594140609274962E+20
 Sum of b is = 0.415188281269478605E+19
 Sum of c is = 0.553584374975187046E+19

real 46.24
user 19.90
sys 2.74

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 18000000
 Offset = 0
 The total memory requirement is 411 MB
 You are running each test 80 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10000 microseconds
 The tests below will each take a time on the order
 of 260000 microseconds
    (= 26 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 800.0000 .4022 .3600 .4500
Scale: 900.0000 .3965 .3200 .4400
Add: 881.6327 .5402 .4900 .6600
Triad: 960.0000 .5273 .4500 .6000
 Sum of a is = 0.440152552974613180E+102
 Sum of b is = 0.880305106010380985E+101
 Sum of c is = 0.117374014130075336E+102

real 187.16
user 150.50
sys 2.99

* Another run with the same binary resulted in:

Function Rate (MB/s) RMS time Min time Max time
Copy: 847.0588 .4002 .3400 .4500
Scale: 778.3784 .3996 .3700 .5200
Add: 919.1489 .5340 .4700 .5800
Triad: 900.0000 .5306 .4800 .5700
 Sum of a is = 0.440152552974613180E+102
 Sum of b is = 0.880305106010380985E+101
 Sum of c is = 0.117374014130075336E+102

real 167.85
user 150.43
sys 3.25

-- 
=========================================================================
Dr. Stefan A. Deutscher                   | (+33-(0)1)   voice      fax
Laboratoire des Collisions Atomiques et   | LCAM :  6915-7699  6915-7671
Mol\'{e}culaires (LCAM), B\^{a}timent 351 | home :  5624-0992  call first
Universit\'{e} de Paris-Sud               | email:  sad@utk.edu 
91405 Orsay Cedex, France (Europe)        |         (forwarded to France)
=========================================================================
 Do you know what they call a quarter-pounder with cheese in Paris?



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT