# stream

From: Charles Grassl (cmg@ferrari.cray.com)
Date: Thu May 27 1993 - 18:00:42 CDT

Hello John;

Below are performance results for running the STREAM benchmark on an 8
CPU CRAY Y-MP EL98. If you are maintaining and distributing a list with

The CRAY Y-MP EL98 has up to 8 CPUs which run at 33.3 MHz. Each CPU
has four floating point functional units. Each functional unit can
produce one floating point add or (exclusive) one floating point
multiply per clock period.

Pairs of EL CPUs share four memory ports. As you can see from the
data, the memory ports can sustain up to four CPUs. For eight CPUs,
the Assignment, Scaling and Summing loops run much faster than for 4
CPUs because the additional CPUs use ports C and D. For SAXPYing, the
additional CPUs only have use of port D. (Actually, the pairs of CPUs
share all four ports.)

This benchmark is an interesting experiment for this memory
architecture!

John, I'm trying to figure out a way to characterize computer power by
a "weighted" memory bandwidth. More memory "closer", or faster, would
count more than memory "farther", or slower. My idea is to calculate
the second moment of memory defined as:

__
Memory Mement (MM) = \ memory words * (rate)^2
/
--

The MM would have units of [ Mbyte * (Mbyte/s)^2 ]

For the rate, I could use the transfer rate measured by STREAM. The
measurement of MM would entale running STREAM for all sizes of memory
and "integrating" or summing the contributions.

Does this look reasonable to you? Would MM have any predictive powers
for computer performance?.

One other question: What would be analogous to torque for the above
"moment of inertia"?

Regards,

```--
Charles Grassl
Cray Research, Inc.
cmg@cray.com
(612) 683-3531

====================================================
1 CPUs
====================================================
--------------------------------------
Single precision appears to have 14 digits of accuracy
Assuming 8 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 31.968054 hundredths of a second
Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  437.2382      0.1098      0.1098      0.1098
Scaling   :  436.6548      0.1106      0.1099      0.1113
Summing   :  536.1533      0.1346      0.1343      0.1349
SAXPYing  :  476.7855      0.1510      0.1510      0.1510
STOP  (called by STREAM )
CP: 1.345s,  Wallclock: 1.349s,  12.5% of 8-CPU Machine
HWM mem: 9129405, HWM stack: 9004265, Stack overflows: 0

====================================================
2 CPUs
====================================================
--------------------------------------
Single precision appears to have 14 digits of accuracy
Assuming 8 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 15.945318 hundredths of a second
Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  826.6837      0.0581      0.0581      0.0582
Scaling   :  833.7715      0.0576      0.0576      0.0576
Summing   : 1048.9873      0.0686      0.0686      0.0687
SAXPYing  : 1078.1763      0.0668      0.0668      0.0669
STOP  (called by STREAM )
CP: 1.354s,  Wallclock: 1.076s,  15.7% of 8-CPU Machine
HWM mem: 9129405, HWM stack: 9004265, Stack overflows: 0

====================================================
4 CPUs
====================================================
--------------------------------------
Single precision appears to have 14 digits of accuracy
Assuming 8 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 8.152287 hundredths of a second
Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment: 1564.9253      0.0307      0.0307      0.0307
Scaling   : 1569.8418      0.0306      0.0306      0.0307
Summing   : 1933.8354      0.0373      0.0372      0.0374
SAXPYing  : 1955.4611      0.0369      0.0368      0.0369
STOP  (called by STREAM )
CP: 1.440s,  Wallclock: 0.772s,  23.3% of 8-CPU Machine
HWM mem: 9139645, HWM stack: 9004265, Stack overflows: 0

====================================================
8 CPUs
====================================================
--------------------------------------
Single precision appears to have 14 digits of accuracy
Assuming 8 bytes per default REAL word
--------------------------------------
Timing calibration ; time = 5.824599 hundredths of a second
Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment: 2362.8129      0.0244      0.0203      0.0279
Scaling   : 2310.5194      0.0249      0.0208      0.0284
Summing   : 2373.6665      0.0312      0.0303      0.0321
SAXPYing  : 2363.7914      0.0305      0.0305      0.0305
STOP  (called by STREAM )
CP: 2.011s,  Wallclock: 1.160s,  21.7% of 8-CPU Machine
HWM mem: 9149885, HWM stack: 9004265, Stack overflows: 0
```

This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:03 CDT