Re: Attainable memory bandwidth

From: Dave Olson (olson@anchor.esd.sgi.com)
Date: Sat Sep 28 1991 - 23:10:25 CDT


In comp.arch you write:

| ==========================================================
| Memory Transfer Rates in MB/s for Standard Fortran Kernels
| ==========================================================
| John D. McCalpin September 24, 1991
| mccalpin@perelandra.cms.udel.edu DELOCN::MCCALPIN
|
| The following table presents the results of a simple test of the
| memory bandwidth of a computer running some simple vector kernels
| coded in Fortran. In each case, only the memory transfer rate in
| Millions of Bytes per second (MB/s) is shown.
|
| Machine Copy Scale Sum SAXPY
| ------------------------------------------------------
| IBM RS/6000-950 190.5 163.3 184.6 187.5
| IBM RS/6000-530 145.5 88.9 109.1 120.0
| IBM RS/6000-320 58.2 55.7 60.0 60.0
| SGI 4D/240 35.6 22.1 34.9 25.6
| DEC 5000 25.9 25.6 23.1 21.8
| Sun 4/490 25.0 16.7 24.0 19.4
| Sun SS1+ 28.6 12.3 25.5 13.3
| SGI 4D/25 23.7 11.6 17.8 13.7
| Sun SS1 24.6 9.3 24.0 9.6

Here are the results for an Indigo running 4.0, and a 4D/35,
also running the released IRIX 4.0. I compiled it as f77 -O. I'm not quite sure
that I believe the accuracy of the timing, since it runs so fast.
Both machines were multiuser, but idle. Indigo should be somewhat
slower (slower clock, same memory system, smaller cache).
but Assignment doesn't seem to follow this model.

% uname -a
IRIX oceana 4.0 08281003 IP12
% hinv
1 33 MHZ IP12 Processor
FPU: MIPS R2010A/R3010 VLSI Floating Point Chip Revision: 4.0
CPU: MIPS R2000A/R3000 Processor Chip Revision: 3.0
On-board serial ports: 2
Data cache size: 32 Kbytes
Instruction cache size: 32 Kbytes
Main memory size: 24 Mbytes
Integral Ethernet: ec0, version 0
Disk drive: unit 1 on SCSI controller 0
Integral SCSI controller 0: Version WD33C93A
Iris Audio Processor, rev 2
Graphics board: LG1
% repeat 3 /tmp/X
 Timing calibration ; t = 29.00000 clicks
     
 Assignment: Rate = 55.17242 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 29.09091 MB/s MFLOPS = 1.818182
 Summing: Rate = 66.66664 MB/s MFLOPS = 2.777777
 SAXPYing: Rate = 47.05882 MB/s MFLOPS = 3.921569
 Timing calibration ; t = 16.00000 clicks
     
 Assignment: Rate = 100.0000 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 32.65306 MB/s MFLOPS = 2.040816
 Summing: Rate = 74.99998 MB/s MFLOPS = 3.125000
 SAXPYing: Rate = 42.85715 MB/s MFLOPS = 3.571429
 Timing calibration ; t = 16.00000 clicks
     
 Assignment: Rate = 100.0000 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 33.33333 MB/s MFLOPS = 2.083333
 Summing: Rate = 57.14286 MB/s MFLOPS = 2.380953
 SAXPYing: Rate = 47.05882 MB/s MFLOPS = 3.921569

Here are the results for a 4D/35:

% uname -a
IRIX mears 4.0 08281003 IP12
% hinv
1 36 MHZ IP12 Processor
FPU: MIPS R2010A/R3010 VLSI Floating Point Chip Revision: 4.0
CPU: MIPS R2000A/R3000 Processor Chip Revision: 3.0
On-board serial ports: 4
Data cache size: 64 Kbytes
Instruction cache size: 64 Kbytes
Main memory size: 16 Mbytes
Integral Ethernet: ec0, version 0
Disk drive: unit 2 on SCSI controller 0
Disk drive: unit 1 on SCSI controller 0
Integral SCSI controller 0: Version WD33C93A
Graphics board: GR1.2 Bit-plane, Z-buffer, Turbo options installed
% repeat 3 /tmp/X
repeat 3 /tmp/X
     
 Assignment: Rate = 48.48484 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 30.76923 MB/s MFLOPS = 1.923077
 Summing: Rate = 68.57143 MB/s MFLOPS = 2.857143
 SAXPYing: Rate = 47.05882 MB/s MFLOPS = 3.921569
 Timing calibration ; t = 21.00000 clicks
     
 Assignment: Rate = 76.19046 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 31.37255 MB/s MFLOPS = 1.960784
 Summing: Rate = 68.57145 MB/s MFLOPS = 2.857144
 SAXPYing: Rate = 46.15383 MB/s MFLOPS = 3.846152
 Timing calibration ; t = 16.00000 clicks
     
 Assignment: Rate = 100.0000 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 33.33333 MB/s MFLOPS = 2.083333
 Summing: Rate = 74.99998 MB/s MFLOPS = 3.125000
 SAXPYing: Rate = 46.15385 MB/s MFLOPS = 3.846154



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:01 CDT