Re: STREAMS

From: PTannenbaum@sx.nec.com
Date: Wed Jun 11 1997 - 04:46:08 CDT


John,

Here is the NEC SX-4 STREAMS output with runs from 1 to 32 CPUs

Plz let me know when/where you "formally informally" post it so
I can relay it back to NEC.

A summary table is at the top followed by the output sections.
Let me know if you see any anomalies. NEC made the summary tables
to include the last known T90 results.

Thanks,
--------------------------------------------------------
Philip D.Tannenbaum
Sr.Director, Marketing Group
HNSX Supercomputers (an NEC Company)
4200 Research Forest Drive, Suite 400
The Woodlands, TX 77381-4257

ph: 281-465-1610 fx: 281-465-1699
Houston Metro Wide Area ph: 281-296-0912 ext 1610
** Always Dial the Area Code in the Houston Metroplex **
--------------------------------------------------------
Representing the NEC SX-4, It's a Jolly Fast Machine!
--------------------------------------------------------

>From NEC Supercomputer Marketing Division System Engineering Dpt.
10 June 1997.

  H/W: SX-4/32 8 GB main memory, 1024 banks
            (original SSRAM model; 8.0 ns clock)
  OS : SUPER-UX unix 7.1
  Compiler: f77sx Rev170

+-------------------+---------------------------------------------------+
|COPY (MB/s) | 1 2 4 8 16 24 32 |
+-------------------+---------------------------------------------------+
|SX-4/32 | 15983 31886 63536 126083 247440 349445 434783|
| | |
|Cray_T932_321024_3E| 10653 20326 40457 80929 160263 237415 310721|
| | |
|Cray_T94 | 11341 |
+-------------------+---------------------------------------------------+

+-------------------+---------------------------------------------------+
|SCALE (MB/s) | 1 2 4 8 16 24 32 |
+-------------------+---------------------------------------------------+
|SX-4/32 | 15983 31885 63536 126083 247343 347354 432855|
| | |
|Cray_T932_321024_3E| 10221 19622 39017 78131 154880 230100 302182|
| | |
|Cray_T94 | 10717 |
+-------------------+---------------------------------------------------+

+-------------------+---------------------------------------------------+
|ADD (MB/s) | 1 2 4 8 16 24 32 |
+-------------------+---------------------------------------------------+
|SX-4/32 | 15989 31924 63694 126725 250262 342246 437357|
| | |
|Cray_T932_321024_3E| 13014 25459 50203 98701 193335 281038 281038|
| | |
|Cray_T94 | 14783 |
+-------------------+---------------------------------------------------+

+-------------------+---------------------------------------------------+
|TRAID (MB/s) | 1 2 4 8 16 24 32 |
+-------------------+---------------------------------------------------+
|SX-4/32 | 15989 31924 63692 126724 250231 345825 436954|
| | |
|Cray_T932_321024_3E| 13682 26117 50718 99343 194562 282300 359270|
| | |
|Cray_T94 | 13920 |
+-------------------+---------------------------------------------------+

[output files]

(1)SX-4/1

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 80000000
 Offset = 0
 The total memory requirement is 1831 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 79 microseconds
 The tests below will each take a time on the order
 of 80123 microseconds
    (= 1014 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 15983.438486 0.075988 0.080083 0.080215
Scaling : 15983.628829 0.075976 0.080082 0.080109
Summing : 15989.071369 0.113948 0.120082 0.120213
SAXPYing : 15989.071369 0.113920 0.120082 0.120083
 Sum of a is : 9.226406249984802D+19
 Sum of b is : 1.845281249988019D+19
 Sum of c is : 2.460375000016015D+19
 

(2)SX-4/2

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 80000000
 Offset = 0
 The total memory requirement is 1831 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 98 microseconds
 The tests below will each take a time on the order
 of 40168 microseconds
    (= 410 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 31886.849422 0.038112 0.040142 0.040284
Scaling : 31885.997197 0.038085 0.040143 0.040151
Summing : 31924.993627 0.057056 0.060141 0.060147
SAXPYing : 31924.993627 0.057070 0.060141 0.060277
 Sum of a is : 9.226406249985602D+19
 Sum of b is : 1.845281249991997D+19
 Sum of c is : 2.460375000000000D+19

(3)SX-4/4

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 80000000
 Offset = 0
 The total memory requirement is 1831 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 98 microseconds
 The tests below will each take a time on the order
 of 20172 microseconds
    (= 206 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 63536.522995 0.019113 0.020146 0.020154
Scaling : 63536.147032 0.019113 0.020146 0.020150
Summing : 63694.317781 0.028598 0.030144 0.030150
SAXPYing : 63692.302728 0.028614 0.030145 0.030292
 Sum of a is : 9.226406249987201D+19
 Sum of b is : 1.845281249999994D+19
 Sum of c is : 2.460375000000000D+19

(4)SX-4/8

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 80000000
 Offset = 0
 The total memory requirement is 1831 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 101 microseconds
 The tests below will each take a time on the order
 of 10388 microseconds
    (= 103 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment:126083.750073 0.009789 0.010152 0.010531
Scaling :126083.750073 0.009739 0.010152 0.010536
Summing :126725.106102 0.014535 0.015151 0.015493
SAXPYing :126724.109020 0.014509 0.015151 0.015355
 Sum of a is : 9.226406249990398D+19
 Sum of b is : 1.845281250000001D+19
 Sum of c is : 2.460375000000000D+19

(5)SX-4/16

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 80000000
 Offset = 0
 The total memory requirement is 1831 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 105 microseconds
 The tests below will each take a time on the order
 of 5202 microseconds
    (= 50 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment:247440.158547 0.004909 0.005173 0.005177
Scaling :247343.259543 0.004910 0.005175 0.005177
Summing :250262.245910 0.007283 0.007672 0.007689
SAXPYing :250231.140527 0.007281 0.007673 0.007676
 Sum of a is : 9.226406249996795D+19
 Sum of b is : 1.845281250000001D+19
 Sum of c is : 2.460374999999999D+19

(6)SX-4/24

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 80000000
 Offset = 0
 The total memory requirement is 1831 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 107 microseconds
 The tests below will each take a time on the order
 of 3724 microseconds
    (= 35 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment:349445.707033 0.003671 0.003663 0.004130
Scaling :347354.368530 0.003509 0.003685 0.003719
Summing :342246.650234 0.005347 0.005610 0.005677
SAXPYing :345825.421596 0.005487 0.005552 0.006201
 Sum of a is : 9.226406250000000D+19
 Sum of b is : 1.845281250000000D+19
 Sum of c is : 2.460375000000001D+19

(7)SX-4/32

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 80000000
 Offset = 0
 The total memory requirement is 1831 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 110 microseconds
 The tests below will each take a time on the order
 of 2986 microseconds
    (= 27 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment:434783.699385 0.002874 0.002944 0.003609
Scaling :432855.689752 0.002890 0.002957 0.003656
Summing :437357.501765 0.004445 0.004390 0.005688
SAXPYing :436954.079219 0.004284 0.004394 0.004950
 Sum of a is : 9.226406249999999D+19
 Sum of b is : 1.845281249999998D+19
 Sum of c is : 2.460374999999998D+19



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:07 CDT