STREAM (base) on POWER3smp

From: Frank Johnston (fjohn@us.ibm.com)
Date: Tue Oct 12 1999 - 16:49:16 CDT


From: Frank Johnston on 10/12/99 04:49 PM
To: John D Mccalpin/Austin/IBM@IBMUS

John,

Here are STREAM results without the compiler directive for a
Nighthawk1 (IBM RS/6000 SP 222 MHz POWER3smp).

Would it be possible to publish both these and the results
obtained with the directive on the table of "Standard Results",
using slightly different machine IDs? I still believe that use
of a simple Fortran compiler directive is much closer in spirit
to the "Standard Results", than "Experimental Results" obtained
with hand-coded assembler.

Machine ID npus COPY SCALE ADD TRIAD
IBM-SP_222MHz-POWER3smp 1 413.0 421.2 587.4 614.2
IBM-SP_222MHz-POWER3smp 2 820.7 800.4 1182.1 1165.4
IBM-SP_222MHz-POWER3smp 4 1603.5 1535.9 2218.3 2187.5
IBM-SP_222MHz-POWER3smp 8 2954.1 2821.0 3889.0 3872.2

Frank Johnston

output for 1 CPU:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 524300
 Offset = 260
 The total memory requirement is 12 MB
 You are running each test 1000 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 5 microseconds
 The tests below will each take a time on the order
 of 8319 microseconds
    (= 1664 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 412.9788 .0204 .0203 .0213
Scale: 421.1776 .0200 .0199 .0210
Add: 587.4248 .0216 .0214 .0226
Triad: 614.2449 .0205 .0205 .0216
 Sum of a is = 1048600.00000001118
 Sum of b is = 434344.341503158561
 Sum of c is = 1482944.34150735638

output for 2 CPUs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 1048600
 Offset = 360
 The total memory requirement is 24 MB
 You are running each test 750 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 5 microseconds
 The tests below will each take a time on the order
 of 8704 microseconds
    (= 1741 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 820.6599 .0205 .0204 .0217
Scale: 800.4499 .0210 .0210 .0222
Add: 1182.0659 .0214 .0213 .0226
Triad: 1165.4323 .0217 .0216 .0229
 Sum of a is = 2097200.00000000373
 Sum of b is = 868688.683004021295
 Sum of c is = 2965888.68302347790

output for 4 CPUs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2097200
 Offset = 272
 The total memory requirement is 48 MB
 You are running each test 1000 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 5 microseconds
 The tests below will each take a time on the order
 of 11072 microseconds
    (= 2214 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 1603.5355 .0219 .0209 .0232
Scale: 1535.8922 .0225 .0218 .0235
Add: 2218.2522 .0235 .0227 .0249
Triad: 2187.5434 .0236 .0230 .0247
 Sum of a is = 4194400.00000001118
 Sum of b is = 1737377.36602994637
 Sum of c is = 5931777.36605572235

output for 8 CPUs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 4194400
 Offset = 224
 The total memory requirement is 96 MB
 You are running each test 1000 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 5 microseconds
 The tests below will each take a time on the order
 of 13120 microseconds
    (= 2624 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 2954.0690 .0240 .0227 .0347
Scale: 2820.9498 .0247 .0238 .0724
Add: 3889.0287 .0271 .0259 .0456
Triad: 3872.2120 .0270 .0260 .0612



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT