Superdome/Dual Core Itanium Stream results

From: Kirby L. Collins <kcollins@rsn.hp.com>
Date: Thu Jul 13 2006 - 15:18:27 CST

Stream results for HP Integrity Superdome with the sx2000 chipset, and
1.6GHz/24MB Dual Core Intel(R) Itanium(R) 2 processors:

16 cells, 64 processors/128 cores, 512GB of memory (512x1GB DIMMs):
----------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 154504.1630 0.0814 0.0795 0.0896
Scale: 152999.3089 0.0813 0.0803 0.0842
Add: 169468.0928 0.1093 0.1088 0.1117
Triad: 170832.6902 0.1112 0.1079 0.1276

8 cells, 32 processors/64 cores, 256GB of memory (256x1GB DIMMs):
----------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 77274.3200 0.2271 0.1590 0.4369
Scale: 76845.2576 0.1608 0.1599 0.1631
Add: 85409.2677 0.3567 0.2158 0.8011
Triad: 85792.1817 0.3145 0.2148 0.5301

4 cells, 16 processors/32 cores, 128GB of memory (128x1GB DIMMs):
----------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 38614.6964 0.3187 0.3182 0.3195
Scale: 38494.8036 0.3199 0.3192 0.3208
Add: 42838.3551 0.4306 0.4303 0.4310
Triad: 42921.9548 0.4298 0.4294 0.4307

Note that with these processors the front side busses run at 533MT/sec.

The system was configured with half of the memory in each cell assigned to
local memory.

The runs used the v5.6 f90 version of the stream benchmark, with the
following changes:

97c97
< PARAMETER (n=2000000,offset=0,ndim=n+offset,ntimes=10)

---
>       PARAMETER (n=768000680,offset=0,ndim=n+offset,ntimes=10)
101c101
<       INTEGER j,k,nbpw,quantum
---
>       INTEGER*8 j,k,nbpw,quantum
106c106
<       INTEGER bytes(4)
---
>       INTEGER*8 bytes(4)
124c124
< *     COMMON a,b,c
---
>       COMMON a,b,c
245c245
<  9020 FORMAT (1x,a,i4,a)
---
>  9020 FORMAT (1x,a,i6,a)
247c247
<  9040 FORMAT ('Function',5x,'Rate (MB/s)  Avg time   Min time  Max time'
---
>  9040 FORMAT ('Function',5x,'Rate (MB/s)    Avg time   Min time  Max time'
249c249
<  9050 FORMAT (a,4 (f10.4,2x))
---
>  9050 FORMAT (a,f12.4,2x,3 (f10.4,2x))
compiled as follows:
f90 -o stream_d.omp +Ofaster +DSitanium2 +DD64 +extend_source +autodbl4
+noppu -Wl,+pd,1M +Oopenmp stream.f mysecond.o
and run with the "mpsched -T FILL" command to distribute threads across
locality domains.  By default each thread allocated memory from the local
memory in each cell.  Here are the outputs for each configuration:
16 cells, 64 processors, 512 GB of memory (512x1GB DIMMs):
-------------------------------------------------------------
mpsched -T FILL stream_d.omp
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size =  768000680
 Offset     =          0
 The total memory requirement is  17578 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads =  128
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)    Avg time   Min time  Max time
Copy:       154504.1630      0.0814      0.0795      0.0896  
Scale:      152999.3089      0.0813      0.0803      0.0842  
Add:        169468.0928      0.1093      0.1088      0.1117  
Triad:      170832.6902      0.1112      0.1079      0.1276  
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
8 cells, 32 processors, 256 GB of memory (256x1GB DIMMs):
-------------------------------------------------------------
mpsched -T FILL stream_d.omp
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size =  768000680
 Offset     =          0
 The total memory requirement is  17578 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads =  64
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)    Avg time   Min time  Max time
Copy:        77274.3200      0.2271      0.1590      0.4369  
Scale:       76845.2576      0.1608      0.1599      0.1631  
Add:         85409.2677      0.3567      0.2158      0.8011  
Triad:       85792.1817      0.3145      0.2148      0.5301  
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
4 cells, 16 processors, 128 GB of memory (128x1GB DIMMs):
-------------------------------------------------------------
mpsched -T FILL stream_d.omp
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size =  768000680
 Offset     =          0
 The total memory requirement is  17578 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads =  32
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)    Avg time   Min time  Max time
Copy:        38614.6964      0.3187      0.3182      0.3195  
Scale:       38494.8036      0.3199      0.3192      0.3208  
Add:         42838.3551      0.4306      0.4303      0.4310  
Triad:       42921.9548      0.4298      0.4294      0.4307  
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
Received on Fri Jul 14 09:44:47 2006

This archive was generated by hypermail 2.1.8 : Fri Jul 14 2006 - 10:14:19 CST