Re: Attainable Memory Bandwidth

From: David Wright (wright@Stardent.COM)
Date: Wed Oct 02 1991 - 15:46:39 CDT


[I hope you haven't made your last posting on this subject, since
there might well be other contributors; I got back as quickly as I
could.]

Here are the results of your benchmark for two Stardent machines,
the Vistra 800b and the ST2000.

(This is the double-precision benchmark in all cases.)

1) Vistra 800b (i860-based) Compiler: Portland Group FTN, Rev 1.4 (beta)

  Compiler output:

    f77 -O4 -Mvect -Mbeta -Mx,0,2 -o mcc2 mcc2.f

    PGFTN-I-Beta Release Optimizations Activated
    Vect: streaming data and stripmining loop at line 106. strip size = 252.
    Vect: loop at line 106 replaced by call to __add8s.
    Vect: streaming data and stripmining loop at line 99. strip size = 252.
    Vect: loop at line 99 replaced by call to __add8s.
    Vect: streaming data and stripmining loop at line 92. strip size = 504.
    Vect: streaming data and stripmining loop at line 85. strip size = 504.
    Vect: streaming data and stripmining loop at line 69. strip size = 252.
    # SW pipelined loop w/ 23 cycles and 2 columns w/ cnt 4 gend for line 115
    Linking:

  Runtime output:

    --------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLEPRECISION word
    --------------------------------------
     Timing calibration ; time = 99.99999627470970 hundredths of a second
     Increase the size of the arrays if this is <30
      and your clock precision is =<1/100 second
     ---------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Assignment: 160.0002 0.0373 0.0300 0.0400
    Scaling : 160.0014 0.0354 0.0300 0.0400
    Summing : 120.0001 0.0652 0.0600 0.0700
    SAXPYing : 120.0001 0.0652 0.0600 0.0700

2) Stardent ST2000 Compiler: Stardent f77 compiler (Rev 2.3)
    (second() modified for Stellix) (vector parallel)

  Compiler output:

    f77 -O3 -o mcc2 mcc2_gs.f

    "mcc2_gs.f", line 125: Advisory: argument variable 'RMSTIME' may inhibit
        vectorization.
    "mcc2_gs.f", line 125: Advisory: argument variable 'MINTIME' may inhibit
        vectorization.
    "mcc2_gs.f", line 125: Advisory: argument variable 'MAXTIME' may inhibit
        vectorization.
    "mcc2_gs.f", line 69: Loop (J-loop) fully parallelized
    "mcc2_gs.f", line 69: Loop (J-loop) fully vectorized
    "mcc2_gs.f", line 82: Loop (K-loop) contains a subroutine or function
        call or character expressions
    "mcc2_gs.f", line 82: Loop (K-loop) not vectorized
    "mcc2_gs.f", line 85: Loop (J-loop) fully parallelized
    "mcc2_gs.f", line 85: Loop (J-loop) fully vectorized
    "mcc2_gs.f", line 92: Loop (J-loop) fully parallelized
    "mcc2_gs.f", line 92: Loop (J-loop) fully vectorized
    "mcc2_gs.f", line 99: Loop (J-loop) fully parallelized
    "mcc2_gs.f", line 99: Loop (J-loop) fully vectorized
    "mcc2_gs.f", line 106: Loop (J-loop) fully parallelized
    "mcc2_gs.f", line 106: Loop (J-loop) fully vectorized
    "mcc2_gs.f", line 115: Loop (J-loop) fully vectorized
    "mcc2_gs.f", line 122: Loop (J-loop) performs I/O or contains character
        expressions
    "mcc2_gs.f", line 122: Loop (J-loop) not vectorized

            Vectorization-Parallelization Summary for Routine STREAM

     Line Index Label Start Stop Step Vec. Par. Reason
    ---------------------------------------------------------------------------
       69 J 10 1 300000 1 FULL FULL
       82 K 60 1 10 1 None None Library or function call
       85 J 20 1 300000 1 FULL FULL
       92 J 30 1 300000 1 FULL FULL
       99 J 40 1 300000 1 FULL FULL
      106 J 50 1 300000 1 FULL FULL
      114 K 80 1 10 1 None None Outer loop of nest
      115 J 70 1 4 1 FULL None
      122 J 90 1 4 1 None None I/O or char expressions

    "mcc2_gs.f", line 181: Warning: label '10' defined but not referenced.
    "mcc2_gs.f", line 181: Loop (J-loop) fully vectorized
    "mcc2_gs.f", line 185: Loop (J-loop) or a contained loop has multiple exits
    "mcc2_gs.f", line 185: Loop (J-loop) not vectorized

            Vectorization-Parallelization Summary for Routine REALSIZE

     Line Index Label Start Stop Step Vec. Par. Reason
    ---------------------------------------------------------------------------
      181 J 20 1 30 1 FULL None
      185 J 30 1 30 1 None None Inner loop: many exits

  Runtime output:

  --------------------------------------
   Double precision appears to have 16 digits of accuracy
   Assuming 8 bytes per DOUBLEPRECISION word
  --------------------------------------
  Timing calibration ; time = 72.9492187500000 hundredths of a second
  Increase the size of the arrays if this is <30
   and your clock precision is =<1/100 second
  ---------------------------------------------------
  Function Rate (MB/s) RMS time Min time Max time
  Assignment: 163.8400 0.0357 0.0293 0.0498
  Scaling : 163.8400 0.0353 0.0293 0.0400
  Summing : 179.8244 0.0463 0.0400 0.0508
  SAXPYing : 179.8244 0.0502 0.0400 0.0605



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:01 CDT