[stream] STREAM results for HP Integrity rx7620

From: Kirby L. Collins <kcollins@rsn.hp.com>
Date: Mon Nov 03 2003 - 15:16:07 CST

Stream results for HP Integrity rx7620 with 1500MHz, 6MB Itanium 2 processors:

2 cells, 8 cpus, 16GB of memory (32x512MB DIMMs):
Function Rate (MB/s) Avg time Min time Max time
Copy: 10107.9681 0.0510 0.0507 0.0528
Scale: 10052.3981 0.0511 0.0509 0.0512
Add: 10237.2371 0.0754 0.0750 0.0766
Triad: 10283.2991 0.0750 0.0747 0.0756

1 cells, 4 cpus, 8GB of memory (16x512MB DIMMs):
Function Rate (MB/s) Avg time Min time Max time
Copy: 5066.9877 0.1012 0.1011 0.1013
Scale: 5038.7170 0.1017 0.1016 0.1019
Add: 5119.9161 0.1502 0.1500 0.1503
Triad: 5148.5113 0.1493 0.1492 0.1494

The system was booted with half of the memory in each
cell configured as local memory.

The runs used the omp version of the stream benchmark, with
the following changes:

63c63
< PARAMETER (n=2000000,offset=0,ndim=n+offset,ntimes=10)

---
>       PARAMETER (n=32001200,offset=0,ndim=n+offset,ntimes=10)
compiled as follows (mysecond_wall.o was a C routine that called gettimeofday):
f90 -o stream_d.omp +Ofaster +DSitanium2 -Wl,+pd,1m +DD64 +Oopenmp +extend_source +noppu stream_d.f mysecond_wall.o
The resulting executable was run with the default thread launch policy (FILL) and
the default local memory allocation policy (first-touch).
Here are the outputs for each configuration:
2 cells, 8 processors, 16 GB of memory (32x512MB DIMMs):
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   32001200
 Offset     =          0
 The total memory requirement is  732 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 Number of Threads =  8
 Number of Threads =  8
 Number of Threads =  8
 Number of Threads =  8
 Number of Threads =  8
 Number of Threads =  8
 Number of Threads =  8
 Number of Threads =  8
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:      10107.9681      0.0510      0.0507      0.0528
Scale:     10052.3981      0.0511      0.0509      0.0512
Add:       10237.2371      0.0754      0.0750      0.0766
Triad:     10283.2991      0.0750      0.0747      0.0756
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
1 cell, 4 processors, 8GB of memory (16x512MB DIMMs):
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =   32001200
 Offset     =          0
 The total memory requirement is  732 MB
 You are running each test  10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 Number of Threads =  4
 Number of Threads =  4
 Number of Threads =  4
 Number of Threads =  4
 ----------------------------------------------------
 Your clock granularity/precision appears to be      1 microseconds
 ----------------------------------------------------
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:       5066.9877      0.1012      0.1011      0.1013
Scale:      5038.7170      0.1017      0.1016      0.1019
Add:        5119.9161      0.1502      0.1500      0.1503
Triad:      5148.5113      0.1493      0.1492      0.1494
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------
Received on Mon Nov 3 15:16:07 2003

This archive was generated by hypermail 2.1.8 : Tue Nov 04 2003 - 12:25:19 CST