[stream] STREAM Results on the NEC SX-7

From: <g-kadaira@bq.jp.nec.com>
Date: Thu Oct 02 2003 - 06:38:11 CDT

Dear Dr. McCalpin,

Please find attached the STREAM runs we obtained on the NEC SX-7.
The machine used is the SX-7/32, which consists of 32 processors.

The performance figures are reported for 1, 2, 4, 8, 16 and 32-processors
configurations of this system.
Each digit after the "/" on the attached table represents the number
of processors actually used for each run, e.g., SX-7/4 for
a 4-processors configuration.

We would be appreciated if you could update the STREAM Web site
with these data.

Thank you.

Best regards,

-------
1. Summary

 Array size = 81920000
          Copy Scale Add Triad [MB/s]
 ------+------------+--------------+--------------+----------------
  1CPU 35068.4498 34782.8940 35327.4908 35327.4908
  2CPU 70084.3698 69856.8333 70572.5955 70572.5955
  4CPU 138979.8930 138876.3234 140393.0574 140384.6923
  8CPU 275304.6291 274726.8072 278324.4919 278244.6674
 16CPU 541178.1404 539823.0694 548731.5151 548585.4982
 32CPU 876174.6974 865144.0930 869179.1524 872259.0658

2. Details
[SX-7/1]

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 81920000
 Offset = 0
 The total memory requirement is 1875 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 30 microseconds
 The tests below will each take a time on the order
 of 37697 microseconds
    (= 1257 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 35068.4498 0.0374 0.0374 0.0374
Scale: 34782.8940 0.0377 0.0377 0.0377
Add: 35327.4908 0.0557 0.0557 0.0557
Triad: 35327.4908 0.0557 0.0557 0.0557
 Sum of a is = 9.447839999984417D+19
 Sum of b is = 1.889567999989171D+19
 Sum of c is = 2.519424000017551D+19

[SX-7/2]

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 81920000
 Offset = 0
 The total memory requirement is 1875 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 31 microseconds
 The tests below will each take a time on the order
 of 18781 microseconds
    (= 606 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 70084.3698 0.0187 0.0187 0.0187
Scale: 69856.8333 0.0188 0.0188 0.0188
Add: 70572.5955 0.0279 0.0279 0.0279
Triad: 70572.5955 0.0279 0.0279 0.0279
 Sum of a is = 9.447839999985216D+19
 Sum of b is = 1.889567999991613D+19
 Sum of c is = 2.519424000000000D+19

[SX-7/4]

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 81920000
 Offset = 0
 The total memory requirement is 1875 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 31 microseconds
 The tests below will each take a time on the order
 of 9475 microseconds
    (= 306 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 138979.8930 0.0094 0.0094 0.0095
Scale: 138876.3234 0.0095 0.0094 0.0095
Add: 140393.0574 0.0140 0.0140 0.0140
Triad: 140384.6923 0.0140 0.0140 0.0140
 Sum of a is = 9.447839999986815D+19
 Sum of b is = 1.889567999999610D+19
 Sum of c is = 2.519424000000000D+19

[SX-7/8]

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 81920000
 Offset = 0
 The total memory requirement is 1875 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 31 microseconds
 The tests below will each take a time on the order
 of 4777 microseconds
    (= 154 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 275304.6291 0.0048 0.0048 0.0048
Scale: 274726.8072 0.0048 0.0048 0.0048
Add: 278324.4919 0.0071 0.0071 0.0071
Triad: 278244.6674 0.0071 0.0071 0.0071
 Sum of a is = 9.447839999990014D+19
 Sum of b is = 1.889568000000000D+19
 Sum of c is = 2.519424000000000D+19

[SX-7/16]

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 81920000
 Offset = 0
 The total memory requirement is 1875 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 32 microseconds
 The tests below will each take a time on the order
 of 2432 microseconds
    (= 76 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 541178.1404 0.0024 0.0024 0.0024
Scale: 539823.0694 0.0024 0.0024 0.0024
Add: 548731.5151 0.0036 0.0036 0.0036
Triad: 548585.4982 0.0036 0.0036 0.0036
 Sum of a is = 9.447839999996410D+19
 Sum of b is = 1.889568000000000D+19
 Sum of c is = 2.519424000000000D+19

[SX-7/32]

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 81920000
 Offset = 0
 The total memory requirement is 1875 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 35 microseconds
 The tests below will each take a time on the order
 of 7392 microseconds
    (= 211 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 876174.6974 0.0016 0.0015 0.0019
Scale: 865144.0930 0.0015 0.0015 0.0016
Add: 869179.1524 0.0023 0.0023 0.0024
Triad: 872259.0658 0.0023 0.0023 0.0024
 Sum of a is = 9.447839999999993D+19
 Sum of b is = 1.889568000000000D+19
 Sum of c is = 2.519424000000000D+19

***********************************************************************
* Gizo Kadaira, Tel 81-3-3798-9131 (direct) *
* Sales Promotion Dept., ext:57265 *
* HPC Marketing Promotion Div., Fax 81-3-3798-9132 *
* 1st Computer Operations Unit e-mail:g-kadaira@bq.jp.nec.com *
***********************************************************************
Received on Thu Oct 2 06:38:11 2003

This archive was generated by hypermail 2.1.8 : Thu Oct 02 2003 - 07:29:11 CDT