Re: stream

From: Charles Grassl (cmg@magnet.cray.com)
Date: Tue Nov 12 1991 - 08:28:14 CST


Hello John;

FYI, some details regarding the NEC SX-3/14 architecture:

Each CPU has four independent double-pipes, for a total of eight
pipes. A pipe is a multiple and and add functional unit.

Each of a pair of CPUs has has access to three memory ports: two reads
and one write. Each port is 256-bits, or four words, wide. It is not
wide enough to support a SAXPY running on all eight pipes. So, even
though the peak speed of the individual CPU is 5500 Mflop/s, on a SAXPY
its peak speed is 2750 Mflop/s.

A -VERY- ineresting test would be to run your stream program on two
CPUs of a SX-3. It would be interesting to see how the shared memory
port contention is resolved.

The Fujitsu VP2000 has a similar shared hardware design. In it, two
CPUs share the actual vector functional units and the memory ports.
The vector registers are duplicated for each CPU, but they also share
paths to memory.

As you have probably seen from some data for the CRAY Y-MP C90, each
CPU has a complete set of three 128-bit wide paths to memory.

Regarding the TMC results, could I look at the source code for the
test? I think that I could get someone at the Minnesota Supercomputer
Center to run it on a CM-2. As a matter of fact, I would like to try
it myself.

Regards,

-- 
Charles M. Grassl
Cray Research, Inc.
(612) 683-3531 cmg@cray.com



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT