STREAM performance on 256 pe O2k.

From: Yoshio Ashizawa (ashizawa@nsg.sgi.com)
Date: Tue Mar 02 1999 - 00:17:50 CST


Hi McCalpin-san,
Thank you for your reply and suggestion that I had better
check the number.

And I checked source again I found that I made a
mistake in timing fuction.
Now I get the reasonable expected numbers, 42 GB/s.

We can make a reasonable criteria for the procurement.

And I submitted a bug #676604 that I had expericenced
with STREAM.

Thank you again,
        ashizawa

typhoon 37% ./go
+ export OMP_NUM_THREADS=256
+ export OMP_DYNAMIC=FALSE
+ export MPC_GANG=OFF
+ export _DSM_PPM=2
+ export _DSM_MUSTRUN=1
+ ./stream
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 160000000
 Offset = 0
 The total memory requirement is 3662 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 4 microseconds
 The tests below will each take a time on the order
 of 34356 microseconds
    (= 8589 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 42825.1375 0.0618 0.0598 0.0721
Scale: 45299.3503 0.0622 0.0565 0.0739
Add: 51802.3491 0.0982 0.0741 0.2121
Triad: 52977.9809 0.0738 0.0725 0.0765
 Sum of a is = 1.8452812500173852E+20
 Sum of b is = 3.6905625000360264E+19
 Sum of c is = 4.9207500000800375E+19

Large Case:

typhoon 21% ./go
+ export OMP_NUM_THREADS=256
+ export OMP_DYNAMIC=FALSE
+ export MPC_GANG=OFF
+ export _DSM_PPM=2
+ export _DSM_MUSTRUN=1
+ ./stream
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 400000000
 Offset = 0
 The total memory requirement is 9155 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 4 microseconds
 The tests below will each take a time on the order
 of 90979 microseconds
    (= 22745 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 42824.2493 0.1505 0.1494 0.1525
Scale: 43213.4826 0.1638 0.1481 0.2423
Add: 48285.8374 0.2003 0.1988 0.2025
Triad: 49275.5196 0.1982 0.1948 0.2171
 Sum of a is = 4.6132031251661586E+20
 Sum of b is = 9.2264062499237265E+19
 Sum of c is = 1.2301875000272059E+20

-- 
------------------------------------------------------
Yoshio Ashizawa  
Scalable Systems Technology Center
Silicon Graphics Inc. Japan
E-mail: ashizawa@nsg.sgi.com
PHONE : +81-3-5488-1838          FAX : +81-3-5420-2397



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT