stream

From: Charles Grassl (cmg@ferrari.cray.com)
Date: Thu May 27 1993 - 18:00:42 CDT


Hello John;

Below are performance results for running the STREAM benchmark on an 8
CPU CRAY Y-MP EL98. If you are maintaining and distributing a list with
STREAM results, could you please add the EL results to your list.

The CRAY Y-MP EL98 has up to 8 CPUs which run at 33.3 MHz. Each CPU
has four floating point functional units. Each functional unit can
produce one floating point add or (exclusive) one floating point
multiply per clock period.

Pairs of EL CPUs share four memory ports. As you can see from the
data, the memory ports can sustain up to four CPUs. For eight CPUs,
the Assignment, Scaling and Summing loops run much faster than for 4
CPUs because the additional CPUs use ports C and D. For SAXPYing, the
additional CPUs only have use of port D. (Actually, the pairs of CPUs
share all four ports.)

This benchmark is an interesting experiment for this memory
architecture!

John, I'm trying to figure out a way to characterize computer power by
a "weighted" memory bandwidth. More memory "closer", or faster, would
count more than memory "farther", or slower. My idea is to calculate
the second moment of memory defined as:

                      __
Memory Mement (MM) = \ memory words * (rate)^2
                      /
                      --

The MM would have units of [ Mbyte * (Mbyte/s)^2 ]

For the rate, I could use the transfer rate measured by STREAM. The
measurement of MM would entale running STREAM for all sizes of memory
and "integrating" or summing the contributions.

Does this look reasonable to you? Would MM have any predictive powers
for computer performance?.

One other question: What would be analogous to torque for the above
"moment of inertia"?

Regards,

-- 
Charles Grassl
Cray Research, Inc.
cmg@cray.com
(612) 683-3531 

==================================================== 1 CPUs ==================================================== -------------------------------------- Single precision appears to have 14 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 31.968054 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 437.2382 0.1098 0.1098 0.1098 Scaling : 436.6548 0.1106 0.1099 0.1113 Summing : 536.1533 0.1346 0.1343 0.1349 SAXPYing : 476.7855 0.1510 0.1510 0.1510 STOP (called by STREAM ) CP: 1.345s, Wallclock: 1.349s, 12.5% of 8-CPU Machine HWM mem: 9129405, HWM stack: 9004265, Stack overflows: 0

==================================================== 2 CPUs ==================================================== -------------------------------------- Single precision appears to have 14 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 15.945318 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 826.6837 0.0581 0.0581 0.0582 Scaling : 833.7715 0.0576 0.0576 0.0576 Summing : 1048.9873 0.0686 0.0686 0.0687 SAXPYing : 1078.1763 0.0668 0.0668 0.0669 STOP (called by STREAM ) CP: 1.354s, Wallclock: 1.076s, 15.7% of 8-CPU Machine HWM mem: 9129405, HWM stack: 9004265, Stack overflows: 0

==================================================== 4 CPUs ==================================================== -------------------------------------- Single precision appears to have 14 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 8.152287 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 1564.9253 0.0307 0.0307 0.0307 Scaling : 1569.8418 0.0306 0.0306 0.0307 Summing : 1933.8354 0.0373 0.0372 0.0374 SAXPYing : 1955.4611 0.0369 0.0368 0.0369 STOP (called by STREAM ) CP: 1.440s, Wallclock: 0.772s, 23.3% of 8-CPU Machine HWM mem: 9139645, HWM stack: 9004265, Stack overflows: 0

==================================================== 8 CPUs ==================================================== -------------------------------------- Single precision appears to have 14 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 5.824599 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 2362.8129 0.0244 0.0203 0.0279 Scaling : 2310.5194 0.0249 0.0208 0.0284 Summing : 2373.6665 0.0312 0.0303 0.0321 SAXPYing : 2363.7914 0.0305 0.0305 0.0305 STOP (called by STREAM ) CP: 2.011s, Wallclock: 1.160s, 21.7% of 8-CPU Machine HWM mem: 9149885, HWM stack: 9004265, Stack overflows: 0



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:03 CDT