Re: IBM RS/6000 or HP Apollo 9000: which to buy?

From: Renu Raman (Renu.Raman@Eng.Sun.COM)
Date: Tue Oct 26 1993 - 19:52:25 CDT


Update of SS10/41
 
> Bytes Bandwidth (MB/s)
>Machine /word Copy Scale Sum Triad
>--------------- ----- -------- -------- -------- --------
>Sun SS10/41 4 34.3 38.4 36.9 37.9
>Sun SS10/30 8 42.1 46.2 46.2 46.2
>Sun SS10/30 4 33.9 41.7 39.0 34.1

Sun SS10/41 8 48.0 48.0 54.0 54.0
Sun SS10/512 8 48.0 48.0 48.0 48.0

Obviously as the memory system does not scale, its about the same....
These numbers were obtained using the Apogee compilers...

renu raman

From hahn@neurocog.lrdc.pitt.edu Ukn Oct 27 12:03:18 1993
Received: from neurocog.lrdc.pitt.edu by perelandra.cms.udel.edu via SMTP (911016.SGI/911001.SGI)
        for mccalpin id AA05504; Wed, 27 Oct 93 12:03:16 -0400
Return-Path: <hahn@neurocog.lrdc.pitt.edu>
Message-Id: <9310271603.AA05504@perelandra.cms.udel.edu>
Received: by neurocog.lrdc.pitt.edu
        (1.37.109.4/16.2) id AA29337; Wed, 27 Oct 93 12:01:59 -0400
From: Mark Hahn <hahn@neurocog.lrdc.pitt.edu>
Subject: stream in C
To: mccalpin
Date: Wed, 27 Oct 1993 12:01:58 -0500 (EDT)
Cc: hahn@neurocog.lrdc.pitt.edu (Mark Hahn, lrdc 512, 6247063, 3633618)
X-Mailer: ELM [version 2.4 PL21]
Content-Type: text
Content-Length: 4344
Status: RO
X-Status:

I had no luck finding the correct time/clock function for the
little-used fortran on this machine, an hp 735, so I translated
your stream.f into reasonably portable C. I'd appreciate it if
you would look it over and verify that it's doing something
comparable to the fortran version. my intent was to post the
c version on comp.benchmarks. BTW, with this code, our 735 gets
about 73 mb/s double saxpy.

/*
* Program: Stream
* Programmer: John D. McCalpin
* Revision: 2.0, September 30,1991
*
* This program measures memory transfer rates in MB/s for simple
* computational kernels coded in Fortran. These numbers reveal the
* quality of code generation for simple uncacheable kernels as well
* as showing the cost of floating-point operations relative to memory
* accesses.
*
* INSTRUCTIONS:
* 1) (fortran-specific, omitted.)
* 2) Stream requires a good bit of memory to run.
* Adjust the Parameter 'N' in the second line of the main
* program to give a 'timing calibration' of at least 20 clicks.
* This will provide rate estimates that should be good to
* about 5% precision.
* 3) Compile the code with full optimization. Many compilers
* generate unreasonably bad code before the optimizer tightens
* things up. If the results are unreasonable good, on the
* other hand, the optimizer might be too smart for me!
* 4) Mail the results to mccalpin@perelandra.cms.udel.edu
* Be sure to include:
* a) computer hardware model number and software revision
* b) the compiler flags
* c) all of the output from the test case.
* Thanks!
*
* this version was ported from fortran to c by mark hahn, hahn+@pitt.edu.
*/

#define N 1000000
#define NTIMES 10

#ifdef __hpux
#define _HPUX_SOURCE 1
#else
#define _INCLUDE_POSIX_SOURCE 1
#endif
#include <limits.h>
#include <sys/time.h>
#include <math.h>
#include <stdio.h>

#ifndef MIN
#define MIN(x,y) ((x)<(y)?(x):(y))
#endif
#ifndef MAX
#define MAX(x,y) ((x)>(y)?(x):(y))
#endif

struct timeval tvStart;

void utimeStart() {
    struct timezone tz;
    gettimeofday(&tvStart,&tz);
}

float utime() {
    struct timeval tv;
    struct timezone tz;
    float utime;
    gettimeofday(&tv,&tz);
    utime = 1e6 * (tv.tv_sec - tvStart.tv_sec) + tv.tv_usec - tvStart.tv_usec;
    if (tv.tv_usec < tvStart.tv_usec)
        utime += 1e6;
    return utime;
}

typedef double real;
static real a[N],b[N],c[N];

int main() {
    int j,k;
    float times[4][NTIMES];
    static float rmstime[4] = {0};
    static float mintime[4] = {FLT_MAX,FLT_MAX,FLT_MAX,FLT_MAX};
    static float maxtime[4] = {0};
    static char *label[4] = {"Assignment:",
                             "Scaling :",
                             "Summing :",
                             "SAXPYing :"};
    static float bytes[4] = { 2 * sizeof(real) * N,
                              2 * sizeof(real) * N,
                              3 * sizeof(real) * N,
                              3 * sizeof(real) * N};

    /* --- SETUP --- determine precision and check timing --- */
    utimeStart();
    for (j=0; j<N; j++) {
        a[j] = 1.0;
        b[j] = 2.0;
        c[j] = 0.0;
    }
    printf("Timing calibration ; time = %f usec.\n",utime());
    printf("Increase the size of the arrays if this is < 300000\n"
           "and your clock precision is =< 1/100 second.\n");
    printf("---------------------------------------------------\n");
    
    /* --- MAIN LOOP --- repeat test cases NTIMES times --- */
    for (k=0; k<NTIMES; k++) {
        utimeStart();
        for (j=0; j<N; j++)
            c[j] = a[j];
        times[0][k] = utime();
        
        utimeStart();
        for (j=0; j<N; j++)
            c[j] = 3.0e0*a[j];
        times[1][k] = utime();
        
        utimeStart();
        for (j=0; j<N; j++)
            c[j] = a[j]+b[j];
        times[2][k] = utime();
        
        utimeStart();
        for (j=0; j<N; j++)
            c[j] = a[j]+3.0e0*b[j];
        times[3][k] = utime();
    }
    
    /* --- SUMMARY --- */
    for (k=0; k<NTIMES; k++) {
        for (j=0; j<4; j++) {
            rmstime[j] = rmstime[j] + (times[j][k] * times[j][k]);
            mintime[j] = MIN(mintime[j], times[j][k]);
            maxtime[j] = MAX(maxtime[j], times[j][k]);
        }
    }
    
    printf("Function Rate (MB/s) RMS time Min time Max time\n");
    for (j=0; j<4; j++) {
        rmstime[j] = sqrt(rmstime[j]/(float)NTIMES);

        printf("%s%11.3f %11.3f %11.3f %11.3f\n",
               label[j],
               bytes[j]/mintime[j],
               rmstime[j],
               mintime[j],
               maxtime[j]);
    }
    return 0;
}

thanks, mark hahn.

-- 
this space intentionally left non-blank.	hahn@neurocog.lrdc.pitt.edu

From hahn@neurocog.lrdc.pitt.edu Ukn Oct 27 12:35:23 1993 Received: from neurocog.lrdc.pitt.edu by perelandra.cms.udel.edu via SMTP (911016.SGI/911001.SGI) for mccalpin id AA05615; Wed, 27 Oct 93 12:35:22 -0400 Return-Path: <hahn@neurocog.lrdc.pitt.edu> Message-Id: <9310271635.AA05615@perelandra.cms.udel.edu> Received: by neurocog.lrdc.pitt.edu (1.37.109.4/16.2) id AA29898; Wed, 27 Oct 93 12:34:05 -0400 From: Mark Hahn <hahn@neurocog.lrdc.pitt.edu> Subject: Re: stream in C To: mccalpin (John D. McCalpin) Date: Wed, 27 Oct 1993 12:34:05 -0500 (EDT) In-Reply-To: <9310271627.AA05590@perelandra.cms.udel.edu> from "John D. McCalpin" at Oct 27, 93 12:27:59 pm X-Mailer: ELM [version 2.4 PL21] Content-Type: text Content-Length: 635 Status: RO X-Status:

> There is one significant difference -- your code measures the elapsed > time, while the Fortran version measures the cpu time. If your machine > is almost idle, then the *minimum* elapsed time is not a bad measure > of the cpu time, otherwise it is nearly impossible to compare. > > I thought that HP had an etime(dummy) function available from Fortran. > Did you look for that one? > > Alternatively, you can use what most folks use, provided that you can > figure out how to link C and Fortran....

good comments, I'll try all three.

thanks, mark hahn. -- this space intentionally left non-blank. hahn@neurocog.lrdc.pitt.edu

From Renu.Raman@Eng.Sun.COM Ukn Oct 27 14:34:43 1993 Received: from Sun.COM by perelandra.cms.udel.edu via SMTP (911016.SGI/911001.SGI) for mccalpin id AA06233; Wed, 27 Oct 93 14:34:26 -0400 Return-Path: <Renu.Raman@Eng.Sun.COM> Received: from Eng.Sun.COM (zigzag.Eng.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA21857; Wed, 27 Oct 93 11:32:15 PDT Received: from shukra.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA21259; Wed, 27 Oct 93 11:31:16 PDT Received: by shukra.Eng.Sun.COM (4.1/SMI-4.1) id AA06850; Wed, 27 Oct 93 11:31:33 PDT Date: Wed, 27 Oct 93 11:31:33 PDT From: Renu.Raman@Eng.Sun.COM (Renu Raman) Message-Id: <9310271831.AA06850@shukra.Eng.Sun.COM> To: mccalpin Subject: Re: IBM RS/6000 or HP Apollo 9000: which to buy? Status: RO X-Status:

>From mccalpin@perelandra.cms.udel.edu Wed Oct 27 04:50:39 1993 48.0 > >Could you please send the raw output of the tests? >I am trying to keep very complete records for all the entries... > >Thanks!

Here are the details

SparcClassic (uSPARC 50MhZ) -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 101.6666617244482 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 57.6001 0.0937 0.0833 0.1000 Scaling : 48.0000 0.1122 0.1000 0.1333 Summing : 48.0000 0.1652 0.1500 0.1833 SAXPYing : 43.2000 0.1852 0.1667 0.2000 *******************************

SS10/41 Without E$ and 128MB of memory

-------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 64.99999836087227 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 48.0000 0.1087 0.1000 0.1167 Scaling : 48.0000 0.1070 0.1000 0.1167 Summing : 54.0001 0.1402 0.1333 0.1500 SAXPYing : 54.0001 0.1385 0.1333 0.1500 ***************************************************

SS10/512 - dual Vikings@50MHZ with E$ (1MB) -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 24.00000095367432 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 48.0000 0.1031 0.1000 0.1100 Scaling : 48.0000 0.1051 0.1000 0.1100 Summing : 48.0000 0.1561 0.1500 0.1600 SAXPYing : 48.0000 0.1561 0.1500 0.1600

renu raman

From bt@irfu.se Ukn Oct 27 18:01:29 1993 Received: from irfu.irfu.se by perelandra.cms.udel.edu via SMTP (911016.SGI/911001.SGI) for mccalpin id AA06568; Wed, 27 Oct 93 18:01:22 -0400 Return-Path: <bt@irfu.se> Received: from abba.irfu.se by irfu.irfu.se with SMTP (16.6/15.6) id AA04121; Wed, 27 Oct 93 22:59:25 +0100 Received: by abba.irfu.se (1.37.109.4/15.6) id AA09035; Wed, 27 Oct 93 22:58:31 +0100 From: Bo Thide' <bt@irfu.se> Message-Id: <9310272158.AA09035@abba.irfu.se> Subject: Streams rsults for HP9000/720 and 735 To: mccalpin Date: Wed, 27 Oct 93 22:58:31 MET Mailer: Elm [revision: 70.85] Status: RO X-Status:

Hi John,

Just wanted to send you the results I obtained for stream on some of our HP's:

HP9000/720: -------------------------------------- Single precision appears to have 7 digits of accuracy Assuming 4 bytes per default REAL word -------------------------------------- Timing calibration ; time = 22.99999 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 45.7143 .0762 .0700 .0800 Scaling : 53.3334 .0662 .0600 .0700 Summing : 48.0000 .1090 .1000 .1100 SAXPYing : 53.3334 .0921 .0900 .1000

-------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 34.00000147521495 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 53.3334 .1011 .0900 .1100 Scaling : 53.3334 .0991 .0900 .1100 Summing : 55.3847 .1361 .1300 .1400 SAXPYing : 55.3847 .1381 .1300 .1400

HP9000/735: -------------------------------------- Single precision appears to have 7 digits of accuracy Assuming 4 bytes per default REAL word -------------------------------------- Timing calibration ; time = 14.0 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 80.0000 .0493 .0400 .0600 Scaling : 106.6668 .0486 .0300 .0600 Summing : 80.0001 .0663 .0600 .0800 SAXPYing : 95.9999 .0687 .0500 .0800

-------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 22.00000043958425 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 68.5715 .0752 .0700 .0800 Scaling : 80.0001 .0743 .0600 .0800 Summing : 80.0001 .0993 .0900 .1100 SAXPYing : 90.0001 .0972 .0800 .1000

I leave it to you to calculate the MFLOPS from these values. BTW, I used HP-UX 9.01 f77 with less than full optimization. With max optimization, dead-code elmination gave ridiculous results (infinite speed).

Bo



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:03 CDT