Re: Attainable memory bandwidth

From: Peter Klausler (pmk@craycos.com)
Date: Fri Oct 04 1991 - 13:55:17 CDT


In article <MCCALPIN.91Oct3160250@pereland.cms.udel.edu> you write:
>Each cpu of the Y/MP is capable of two 64-bit loads and one 64-bit
>store per clock cycle --- these are overlappable vector operations.
>Thus each cpu can move 24 bytes/cycle, or 4000 MB/s.

There are four memory ports of an X-MP/Y-MP, and their functions are:

        port A: B-register block loads, vector loads (2nd choice), exchange load
        port B: T-register block loads, vector loads (1rst choice)
        port C: Scalar loads, scalar/B/T/vector stores, exchange store
        port D: instruction fetch, all I/O

Vector loads will use port B if free, else port A. All stores, except from
I/O channels, go out through port C. (On an X-MP instruction fetch uses port A.)

So you need to include I/O bandwidth in your estimate, as all Cray machines
(1/X/Y/2/3) perform I/O through processor memory ports.

An important feature of CRAY-1/X/Y memory ports is that references run
serially. If, say, a vector element load is waiting for a busy section or bank,
its port will wait for the element to proceed. A port won't go on ahead to
process other element references that could perhaps proceed. This serial
operation is important for the implementation of vector chaining operations,
since the data is guaranteed to flow back to the processor from a vector load
or gather in the proper order, possibly with gaps.

The CRAY-2 and CRAY-3 memory systems are very different from those of the
CRAY-1 and its descendents. The CRAY-2 considers memory to be divided in four
quadrants and has one quarter-speed port per processor for each quadrant. All
memory references are single-element loads and stores, and the references
constituting a block load or store are able to get out of order.

Peter Klausler
Compilers
Cray Computer (NOT Cray Research)
pmk@craycos.com



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:01 CDT