[stream] Stream results for HP rp7420-16

From: Kirby L. Collins <kcollins@rsn.hp.com>
Date: Tue Feb 10 2004 - 16:21:19 CST

Stream results for HP 9000 rp7420-16 with 1000MHz dual processor PA-8800
modules with 32MB external cache (shared by each processor pair):

System Copy Scale Add Triad
HP 9000 rp7420-16 1 cell, 8 cpus 4979 4971 4910 5098
HP 9000 rp7420-16 2 cells, 8 cpus 7877 7844 8672 8689
HP 9000 rp7420-16 2 cells, 16 cpus 8030 7964 8753 8809

The 2 cell configuration with 2 PA-8800 modules per cell
(half populated) is orderable and fully supported by HP.

The system was running HP-UX 11i TCOE (December 2003), with
all memory interleaved across the cells.

The f90 version of the stream benchmark was compiled auto-parallel, with
the following changes (mysecond.c is a C routine that calls gettimeofday):

63c63
< PARAMETER (n=2000000,offset=0,ndim=n+offset,ntimes=10)

---
>       PARAMETER (n=288000320,offset=0,ndim=n+offset,ntimes=10)
72c72
<       INTEGER bytes(4)
---
>       INTEGER*8 bytes(4)
90c90
< *     COMMON a,b,c
---
>       COMMON a,b,c
200c200
<  9020 FORMAT (1x,a,i4,a)
---
>  9020 FORMAT (1x,a,i8,a)
	rm -f *.o stream_d.mp stream_d.uni stream_c.mp stream_c.uni
	cc +DD64 +O3 -c mysecond.c
	f90 -o stream_d.mp +Ofaster -Wl,+pd,L +DD64 +Oautopar +Onoopenmp +autodbl4 +extend_source +noppu stream_d.f mysecond.o
results for 1 cell, 8 processors (4 PA-8800 modules), 16GB of memory (16x1GB DIMMs):
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =  288000320
 Offset     =          0
 The total memory requirement is     6591 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:       4979.2749      0.9333      0.9254      0.9857  
Scale:      4970.8471      0.9346      0.9270      0.9857  
Add:        4909.6093      1.4092      1.4079      1.4104  
Triad:      5098.1855      1.4015      1.3558      1.4086  
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
results for 2 cells, 8 processors (4 PA-8800 modules), 32GB of memory (32x1GB DIMMs):
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =  288000320
 Offset     =          0
 The total memory requirement is     6591 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:       7877.3744      0.5884      0.5850      0.5912  
Scale:      7844.0006      0.5908      0.5875      0.5940  
Add:        8671.5414      0.7998      0.7971      0.8047  
Triad:      8689.1576      0.8168      0.7955      0.8752  
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
results for 2 cells, 16 processors (8 PA-8800 modules), 32GB of memory (16x512MB DIMMs):
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =  288000320
 Offset     =          0
 The total memory requirement is     6591 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:       8030.2633      0.5755      0.5738      0.5772  
Scale:      7963.5097      0.5865      0.5786      0.6346  
Add:        8753.1337      0.7909      0.7897      0.7923  
Triad:      8808.9195      0.7905      0.7847      0.8122  
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
Received on Tue Feb 10 16:21:19 2004

This archive was generated by hypermail 2.1.8 : Wed Feb 11 2004 - 16:09:54 CST