Stream Results - Sun E10000

From: Brian Whitney (Brian.Whitney@West.Sun.COM)
Date: Fri Jun 11 1999 - 17:50:18 CDT


Dear Dr. McCalpin,

Please find attached STREAM runs on the Sun Enterprise 10000.

I modified the code to flush the cache between kernels. Without
this modification, the results were faster than the interconnect
supports, because of caching. The copy results are faster
than the other three because of a compiler flag which allows
us to bypass the cache, allowing for faster bcopy performance.

I have made a family of runs, a set on a 16 board system, a set
on a 12 board system and a set on an 8 board system. Results
are reported for 1, 16, 32, 48, and 64 CPUs (as appropriate).
File names are

  mod.stream.NNonB
  
where

  NN is the number of cpus used
  B is the number of boards in the system.
  
Each board has 4 GBs of memory on it, 4 400 MHz, 8MB l2$ UltraSPARC II
cpus.

Thanks,
Brian

Brian Whitney Sun Microsystems, Inc.
Strategic Applications Engineering 8300 SW Creekside Pl.
office: (503) 520-7616 Beaverton, OR 97008
fax: (503) 520-7724 e-mail: Brian.Whitney@west.sun.com
    

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 31602
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 3453153 microseconds
    (= 3453153 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 364.1918 2.2020 2.1966 2.2072
Scale: 215.3819 3.7150 3.7143 3.7160
Add: 287.4328 4.1755 4.1749 4.1763
Triad: 296.0569 4.0539 4.0533 4.0557
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 24002
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 234538 microseconds
    (= 234538 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 4925.4124 0.1629 0.1624 0.1648
Scale: 3030.9344 0.2643 0.2639 0.2646
Add: 3703.3682 0.3244 0.3240 0.3247
Triad: 3661.7034 0.3279 0.3277 0.3283
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 16102
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 124946 microseconds
    (= 124946 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 9053.1679 0.0892 0.0884 0.0931
Scale: 5176.1229 0.1548 0.1546 0.1549
Add: 6260.2872 0.1919 0.1917 0.1922
Triad: 6301.5355 0.1905 0.1904 0.1906
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 8002
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 70413 microseconds
    (= 70413 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 12056.9571 0.0672 0.0664 0.0731
Scale: 6356.9620 0.1262 0.1258 0.1266
Add: 7658.5054 0.1570 0.1567 0.1576
Triad: 7572.8558 0.1587 0.1585 0.1590
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 102
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 231511 microseconds
    (= 231511 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 12430.3516 0.0721 0.0644 0.1157
Scale: 7222.7912 0.1223 0.1108 0.1557
Add: 8202.1488 0.1474 0.1463 0.1481
Triad: 7607.8306 0.1660 0.1577 0.2003
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 23501
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 3448896 microseconds
    (= 3448896 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 362.3409 2.2154 2.2079 2.2196
Scale: 215.8045 3.7080 3.7071 3.7092
Add: 288.2899 4.1637 4.1625 4.1651
Triad: 296.1738 4.0525 4.0517 4.0532
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 16001
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 235383 microseconds
    (= 235383 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 5181.8820 0.1547 0.1544 0.1563
Scale: 2976.4744 0.2690 0.2688 0.2693
Add: 3655.5044 0.3286 0.3283 0.3292
Triad: 3696.3705 0.3249 0.3246 0.3251
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 8001
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 128476 microseconds
    (= 128476 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 8474.0441 0.0947 0.0944 0.0966
Scale: 4746.2450 0.1781 0.1686 0.2459
Add: 5726.7007 0.2097 0.2095 0.2100
Triad: 5770.7797 0.2080 0.2079 0.2082
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 1
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 89427 microseconds
    (= 89427 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 10536.3358 0.0770 0.0759 0.0831
Scale: 4950.2602 0.1647 0.1616 0.1847
Add: 6336.5245 0.1895 0.1894 0.1897
Triad: 5940.5107 0.2023 0.2020 0.2026
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 15501
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 3445439 microseconds
    (= 3445439 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 362.2980 2.2103 2.2081 2.2120
Scale: 215.9640 3.7059 3.7043 3.7067
Add: 288.7717 4.1566 4.1555 4.1575
Triad: 296.4783 4.0490 4.0475 4.0507
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 8001
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 245438 microseconds
    (= 245438 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 4892.7287 0.1639 0.1635 0.1652
Scale: 2885.0024 0.2775 0.2773 0.2777
Add: 3465.8932 0.3464 0.3462 0.3466
Triad: 3445.9909 0.3484 0.3482 0.3486
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)

shmget status: Error 0
shmat status: Error 0
shmctl to destroy the block: Error 0
Value of shared_memory_key: 1
Got memory at 9e800000
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 50000000
 Offset = 0
 The total memory requirement is 1144 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be 1 microseconds
 The tests below will each take a time on the order
 of 166871 microseconds
    (= 166871 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6932.9629 0.1190 0.1154 0.1423
Scale: 3556.5623 0.2327 0.2249 0.2468
Add: 4312.7981 0.2915 0.2782 0.3188
Triad: 4031.9559 0.3077 0.2976 0.3217
 Sum of a is = 5.7665039122923D+19
 Sum of b is = 1.1533007808505D+19
 Sum of c is = 1.5377343764968D+19
 Note: Nonstandard floating-point mode enabled
 See the Numerical Computation Guide, ieee_sun(3M)



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT