[stream] Stream results for HP rp8420-32

From: Kirby L. Collins <kcollins@rsn.hp.com>
Date: Tue Feb 10 2004 - 16:20:39 CST

Stream results for HP 9000 rp7420-16 with 1000MHz dual processor PA-8800
modules with 32MB external cache (shared by each processor pair):

System Copy Scale Add Triad
HP 9000 rp8420-32 2 cells, 16 cpus 7932 7928 8408 8395
HP 9000 rp8420-32 4 cells, 16 cpus 10136 10117 10639 10700
HP 9000 rp8420-32 4 cells, 32 cpus 10094 10002 10544 10641

The 4 cell configuration with 2 PA-8800 modules per cell
(half populated) is orderable and fully supported by HP.

The system was running HP-UX 11i TCOE (December 2003), with
all memory interleaved across the cells.

The f90 version of the stream benchmark was compiled auto-parallel, with
the following changes (mysecond.c is a C routine that calls gettimeofday):

Sun Feb 8 18:50:39 CST 2004
63c63
< PARAMETER (n=2000000,offset=0,ndim=n+offset,ntimes=10)

---
>       PARAMETER (n=448000320,offset=0,ndim=n+offset,ntimes=10)
72c72
<       INTEGER bytes(4)
---
>       INTEGER*8 bytes(4)
90c90
< *     COMMON a,b,c
---
>       COMMON a,b,c
200c200
<  9020 FORMAT (1x,a,i4,a)
---
>  9020 FORMAT (1x,a,i8,a)
	rm -f *.o stream_d.mp stream_d.uni stream_c.mp stream_c.uni
	cc +DD64 +O3 -c mysecond.c
	f90 -o stream_d.mp +Ofaster -Wl,+pd,L +DD64 +Oautopar +Onoopenmp +autodbl4 +extend_source +noppu stream_d.f mysecond.o
results for 2 cells, 16 processors (8 PA-8800 modules), 16GB of memory (32x512MB DIMMs):
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =  448000320
 Offset     =          0
 The total memory requirement is    10253 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:       7932.2799      0.9042      0.9037      0.9061  
Scale:      7928.1215      0.9043      0.9041      0.9047  
Add:        8407.6454      1.2799      1.2788      1.2808  
Triad:      8395.1602      1.2817      1.2807      1.2828  
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
results for 4 cells, 16 processors (8 PA-8800 modules), 32GB of memory (64x512MB DIMMs):
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =  448000320
 Offset     =          0
 The total memory requirement is    10253 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:      10136.3101      0.7079      0.7072      0.7091  
Scale:     10116.8578      0.7110      0.7085      0.7125  
Add:       10639.3263      1.0156      1.0106      1.0230  
Triad:     10700.2172      1.0145      1.0048      1.0194  
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
results for 4 cells, 32 processors (16 PA-8800 modules), 32GB of memory (64x512MB DIMMs):
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =  448000320
 Offset     =          0
 The total memory requirement is    10253 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:      10094.3305      0.7113      0.7101      0.7120  
Scale:     10001.5278      0.7179      0.7167      0.7189  
Add:       10543.5822      1.0208      1.0198      1.0227  
Triad:     10641.0534      1.0122      1.0104      1.0138  
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
Received on Tue Feb 10 16:20:39 2004

This archive was generated by hypermail 2.1.8 : Wed Feb 11 2004 - 16:09:54 CST