**Next message:**Frank Johnston: "STREAM (base) on POWER3smp"**Previous message:**Daniel S. Katz: "STREAM results - Mac G3-400"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

To whom it may concern,

I would like to submit STREAM benchmark results for an

IBM RS/6000 SP 222 MHz POWER3smp. Please add the following

summary to the web page of standard STREAM results:

Machine ID npus COPY SCALE ADD TRIAD

IBM-SP_222MHz-POWER3smp 1 790.4 794.1 951.0 953.3

IBM-SP_222MHz-POWER3smp 2 1549.2 1524.4 1843.5 1844.5

IBM-SP_222MHz-POWER3smp 4 2677.0 2630.9 3091.3 3101.2

IBM-SP_222MHz-POWER3smp 8 4332.9 4211.4 4843.3 4937.8

Each CPU on this machine has a private 4 MB L2 cache. The array

size at each CPU count exceeds the aggregate available L2. However

the arrays were kept as small as possible in order to give

system tables mapping virtual to real memory a chance to reside in

the L2. When data in these tables comes from main memory,

there is additional memory traffic which STREAM does not count.

In addition to changing the array size and offset, the following

changes were made to the Fortran STREAM code:

1) Increase "ntimes" to 1000 in order to reduce run-to-run fluctuations.

2) Change the "scalar" constant to sqrt(2) - 1 to prevent overflows (INFs).

3) Add compiler directives to allow the code to run in parallel on multiple

CPUs and to allow certain arrays to be "prefetched" by the hardware.

Here is an example (the COPY loop):

!smp$ parallel do private(j) , schedule(static)

DO 30 j = 1,n

!ibm* prefetch_by_load( c(j) )

c(j) = a(j)

30 CONTINUE

If you have any questions, please let me know.

Frank Johnston

(fjohn@us.ibm.com, xxx-xxx-xxxx)

P.S. Here are the actual outputs:

output for 1 CPU:

----------------------------------------------

Double precision appears to have 16 digits of accuracy

Assuming 8 bytes per DOUBLE PRECISION word

----------------------------------------------

Array size = 524300

Offset = 330

The total memory requirement is 12 MB

You are running each test 1000 times

The *best* time for each test is used

----------------------------------------------------

Your clock granularity/precision appears to be 5 microseconds

The tests below will each take a time on the order

of 8259 microseconds

(= 1652 clock ticks)

Increase the size of the arrays if this shows that

you are not getting at least 20 clock ticks per test.

----------------------------------------------------

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

----------------------------------------------------

Function Rate (MB/s) RMS time Min time Max time

Copy: 790.4472 .0107 .0106 .0116

Scale: 794.1492 .0106 .0106 .0109

Add: 951.0364 .0133 .0132 .0144

Triad: 953.3298 .0133 .0132 .0137

Sum of a is = 1048600.00000001118

Sum of b is = 434344.341503158561

Sum of c is = 1482944.34150735638

output for 2 CPUs:

----------------------------------------------

Double precision appears to have 16 digits of accuracy

Assuming 8 bytes per DOUBLE PRECISION word

----------------------------------------------

Array size = 1048600

Offset = 404

The total memory requirement is 24 MB

You are running each test 1000 times

The *best* time for each test is used

----------------------------------------------------

Your clock granularity/precision appears to be 5 microseconds

The tests below will each take a time on the order

of 15856 microseconds

(= 3171 clock ticks)

Increase the size of the arrays if this shows that

you are not getting at least 20 clock ticks per test.

----------------------------------------------------

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

----------------------------------------------------

Function Rate (MB/s) RMS time Min time Max time

Copy: 1549.2400 .0109 .0108 .0201

Scale: 1524.3722 .0111 .0110 .0120

Add: 1843.4750 .0137 .0137 .0147

Triad: 1844.5380 .0137 .0136 .0143

Sum of a is = 2097200.00000001118

Sum of b is = 868688.683004021645

Sum of c is = 2965888.68302347837

output for 4 CPUs:

----------------------------------------------

Double precision appears to have 16 digits of accuracy

Assuming 8 bytes per DOUBLE PRECISION word

----------------------------------------------

Array size = 2097200

Offset = 260

The total memory requirement is 48 MB

You are running each test 1000 times

The *best* time for each test is used

----------------------------------------------------

Your clock granularity/precision appears to be 5 microseconds

The tests below will each take a time on the order

of 32019 microseconds

(= 6404 clock ticks)

Increase the size of the arrays if this shows that

you are not getting at least 20 clock ticks per test.

----------------------------------------------------

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

----------------------------------------------------

Function Rate (MB/s) RMS time Min time Max time

Copy: 2676.9512 .0133 .0125 .0329

Scale: 2630.9133 .0133 .0128 .0229

Add: 3091.3001 .0169 .0163 .0181

Triad: 3101.2452 .0168 .0162 .0180

Sum of a is = 4194400.00000001118

Sum of b is = 1737377.36602994637

Sum of c is = 5931777.36605572235

output for 8 CPUs:

----------------------------------------------

Double precision appears to have 16 digits of accuracy

Assuming 8 bytes per DOUBLE PRECISION word

----------------------------------------------

Array size = 4194400

Offset = 468

The total memory requirement is 96 MB

You are running each test 1000 times

The *best* time for each test is used

----------------------------------------------------

Your clock granularity/precision appears to be 5 microseconds

The tests below will each take a time on the order

of 64196 microseconds

(= 12839 clock ticks)

Increase the size of the arrays if this shows that

you are not getting at least 20 clock ticks per test.

----------------------------------------------------

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

----------------------------------------------------

Function Rate (MB/s) RMS time Min time Max time

Copy: 4332.9165 .0166 .0155 .0576

Scale: 4211.3665 .0170 .0159 .0588

Add: 4843.2744 .0217 .0208 .0607

Triad: 4937.7796 .0213 .0204 .0486

Sum of a is = 8388800.00000001118

Sum of b is = 3474754.73209443502

Sum of c is = 11863554.7321202103

**Next message:**Frank Johnston: "STREAM (base) on POWER3smp"**Previous message:**Daniel S. Katz: "STREAM results - Mac G3-400"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ]

*
This archive was generated by hypermail 2b29
: Tue Apr 18 2000 - 05:23:08 CDT
*