stream_d.f timing for FPS 511

From: Dana Jacobsen (jacobsd@ucs.orst.edu)
Date: Fri Oct 04 1991 - 06:14:39 CDT


  This is the timing for stream_d.f, run on an FPS 511. The FPS is configured
with 1 SPARC scalar processor (of possible 8), 1 vector processor (of possible 2),
and 0 matrix processors (of possible 168 (i860s)). 128 Meg of RAM. Running
+FPX 5.0.1. I changed n to 900000 to get better numbers. I do not know
all the tricks of this compiler, so it is possible I've missed some key compiler
flags that would speed this up -- I just told it to make vectorized code.
  Unfortunately it looks like FPS is going to go out of business. Sigh.

========
Script started on Fri Oct 4 04:05:48 1991
mesg: cannot change mode
/dev/ttyp1: Not owner
fps /home/ucs/u1/staff/jacobsd/src/bench/mem 401% ls
stream.f stream_d.f stream_d.o stream_s.f table.print table.ps table.sc typescript
fps /home/ucs/u1/staff/jacobsd/src/bench/mem 402% f77 -Oc vec+ stream_d.f
stream_d.f:
   MAIN stream:
   second:
   realsize:
   dummy:
7.3u 1.5s 0:15 56% 0+1776k 12+90io 0pf+0w
fps /home/ucs/u1/staff/jacobsd/src/bench/mem 403% ./a.out
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
Timing calibration ; time = 126.00000537932 hundredths of a second
Increase the size of the arrays if this is <30
 and your clock precision is =<1/100 second
---------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 288.0016 0.0591 0.0500 0.0600
Scaling : 180.0002 0.0871 0.0800 0.0900
Summing : 216.0002 0.1122 0.1000 0.1200
SAXPYing : 196.3642 0.1121 0.1100 0.1200
4.8u 0.2s 0:11 46% 0+18016k 0+0io 0pf+0w
fps /home/ucs/u1/staff/jacobsd/src/bench/mem 404% head -50 stream_d.f
* Program: Stream
* Programmer: John D. McCalpin
* Revision: 2.0, September 30,1991
*
* This program measures memory transfer rates in MB/s for simple
* computational kernels coded in Fortran. These numbers reveal the
* quality of code generation for simple uncacheable kernels as well
* as showing the cost of floating-point operations relative to memory
* accesses.
*
* INSTRUCTIONS:
* 1) Stream requires a cpu timing function called second().
* A sample is shown below. This is unfortunately rather
* system dependent. It helps to know the granularity of the
* timing. The code below assumes that the granularity is
* 1/100 seconds.
* 2) Stream requires a good bit of memory to run.
* Adjust the Parameter 'N' in the second line of the main
* program to give a 'timing calibration' of at least 20 clicks.
* This will provide rate estimates that should be good to
* about 5% precision.
* 3) Compile the code with full optimization. Many compilers
* generate unreasonably bad code before the optimizer tightens
* things up. If the results are unreasonable good, on the
* other hand, the optimizer might be too smart for me!
* 4) Mail the results to mccalpin@perelandra.cms.udel.edu
* Be sure to include:
* a) computer hardware model number and software revision
* b) the compiler flags
* c) all of the output from the test case.
*
* Thanks!
*
      PROGRAM stream
C .. Parameters ..
      INTEGER n,ntimes
      PARAMETER (n=900000,ntimes=10)
C ..
C .. Local Scalars ..
      DOUBLE PRECISION t,t0
      INTEGER j,k,nbpw
C ..
C .. Local Arrays ..
      DOUBLE PRECISION a(n),b(n),c(n),maxtime(4),mintime(4),rmstime(4),
     $ times(4,ntimes)
      INTEGER bytes(4)
      CHARACTER label(4)*11
C ..
C .. External Functions ..
      DOUBLE PRECISION second
fps /home/ucs/u1/staff/jacobsd/src/bench/mem 405% exit
fps /home/ucs/u1/staff/jacobsd/src/bench/mem 406%
script done on Fri Oct 4 04:07:00 1991
========

--
Dana Jacobsen
jacobsd@cs.orst.edu
Oregon State University     Computer Science



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:01 CDT