From mccalpin@enso.ocean.fsu.edu Tue Jan 25 11:07:22 1994
Received: from mailer.fsu.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA01345; Tue, 25 Jan 94 11:07:20 -0500
Return-Path: <mccalpin@enso.ocean.fsu.edu>
Received: from enso.ocean.fsu.edu by mailer.fsu.edu with SMTP id AA29308
  (5.65c/IDA-1.4.4 for <@mailer.fsu.edu:mccalpin@perelandra.cms.udel.edu>); Tue, 25 Jan 1994 11:10:06 -0500
Received: by enso.ocean.fsu.edu (931110.SGI/931108.SGI.AUTO.ANONFTP)
	for @mailer.fsu.edu:mccalpin@perelandra.cms.udel.edu id AA03404; Tue, 25 Jan 94 11:09:52 -0500
Date: Tue, 25 Jan 94 11:09:52 -0500
From: mccalpin@enso.ocean.fsu.edu (John D. McCalpin)
Subject: stream_d on SGI Onyx
Message-Id: <9401251609.AA03404@enso.ocean.fsu.edu>
Apparently-To: mccalpin
Status: RO

--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    199.9999962747097     hundredths of a second
 Increase the size of the arrays if this is <30
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Summing   :   56.4707      0.8550      0.8500      0.8600
SAXPYing  :   61.5385      0.7880      0.7800      0.7900

Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   58.1819      0.5600      0.5500      0.5700
Scaling   :   57.1429      0.5660      0.5600      0.5800
Summing   :   57.1428      0.8520      0.8400      0.8600
SAXPYing  :   62.3377      0.7820      0.7700      0.7900

From ben@REX.RE.uokhsc.edu Wed Jan 19 23:18:50 1994
Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA09623; Wed, 19 Jan 94 23:18:48 -0500
Return-Path: <ben@REX.RE.uokhsc.edu>
Received: from who.net.uokhsc.edu (WHO.NET.UOKHSC.EDU [157.142.8.10]) by bach.udel.edu (8.6.4/8.6.4) with SMTP id XAA11442 for <mccalpin@bach.udel.edu>; Wed, 19 Jan 1994 23:21:05 -0500
Received: from REX.RE.UOKHSC.EDU by who.net.uokhsc.edu (5.64/A/UX-3.00)
	id AA11272; Wed, 19 Jan 94 23:23:01 PST
Received: by rex.re.uokhsc.edu (5.4R2.01/1.34)
	id AA05227; Wed, 19 Jan 1994 22:20:38 -0600
From: ben@REX.RE.uokhsc.edu (Benjamin Z. Goldsteen)
Message-Id: <9401200420.AA05227@rex.re.uokhsc.edu>
Subject: STREAM results for AXP 4000/710
To: mccalpin@bach.udel.edu
Date: Wed, 19 Jan 1994 22:20:38 -0600 (CST)
Reply-To: benjamin-goldsteen@uokhsc.edu
X-Mailer: ELM [version 2.4 PL21]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 2446      
Status: RO
X-Status: 

Hello,
     I found AXPOSF.PA.DEC.COM nearly completely unloaded (1 other
non-idle user and UNIX load-average was 0.0) today so I thought I might
just run the STREAM benchmarks on it because the only numbers I had
seen were for the DEC AXP 3000/500.  The AXPOSF is a DEC AXP 4000/710
with a 21064-190/40 (internal/external MHZ) and a 4MB secondary cache. 
The peek memory bus speed is rated at 640 MB/second (128-bits * 16
bytes/128-bits * 40 M clock-ticks/second).  AXPOSF runs DEC OSF/1 V1.3. 
I used the compiler (V1.3) options:

f77 -O4 -align dcommon stream_?.f second.o -o stream_?
(things like "-assume noaccuracy_sensitive" didn't make much
difference.  I tried KAP "just to see" and I think it completely
deleted the main loop)

I was rather surprised at the results since they are lower than the DEC
AXP 3000/500 results.  When I did the test I did 'up the "N" parameter
to 4,000,000 and I think this is the cause since when I left them at
the default they tended to wander (around the 100MB/sec).  These
numbers seem stable and consistant:

--------------------------------------
 Single precision appears to have  7 digits of accuracy
 Assuming 4 bytes per default REAL word
--------------------------------------
 Timing calibration ; time =    146.3024     hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   80.3603      0.3990      0.3982      0.3992
Scaling   :   80.3601      0.3991      0.3982      0.3992
Summing   :   79.1953      0.6080      0.6061      0.6100
SAXPYing  :   78.8146      0.6107      0.6090      0.6149

--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    360.6319956481457     hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   84.6114      0.7576      0.7564      0.7593
Scaling   :   82.2759      0.7801      0.7779      0.7896
Summing   :   85.0871      1.1300      1.1283      1.1322
SAXPYing  :   83.8539      1.1469      1.1448      1.1536

-- 
Benjamin Z. Goldsteen

From mccalpin Fri Feb  4 10:07:57 1994
Received: by perelandra.cms.udel.edu (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA00902; Fri, 4 Feb 94 10:07:57 -0500
Return-Path: <mccalpin>
Date: Fri, 4 Feb 94 10:07:57 -0500
From: mccalpin (John D. McCalpin)
Message-Id: <9402041507.AA00902@perelandra.cms.udel.edu>
To: mccalpin
Subject: Stream data for Sun SC2000
Status: RO

stream_d data from strauss.udel.edu, an 8-cpu SparcCenter 2000
using the Apogee apf77 compiler and KAP preprocessor.

The timing is done with the 'etime' intrinsic, whose accuracy
in the case of multiple processors is quite uncertain....

Parameters:	
	N=2,000,000
	ntimes=10

Shell:
	unlimit stacksize
	apf77 -Xkap=mp stream_d.f -o stream_d
	foreach N (1 2 3 4 5 6 7 8)
	    echo PARALLEL = $PARALLEL 
	    setenv PARALLEL $N
	    stream_d 
	end

Results:

PARALLEL = 1
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   79.0126      0.8220      0.8100      0.8300
Scaling   :   76.1905      0.8450      0.8400      0.8500
Summing   :   75.5908      1.2800      1.2700      1.3000
SAXPYing  :   71.1112      1.3550      1.3500      1.3700
PARALLEL = 2
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  152.3809      0.4344      0.4200      0.4900
Scaling   :  148.8378      0.4412      0.4300      0.4800
Summing   :  147.6928      0.6591      0.6500      0.6800
SAXPYing  :  139.1307      0.6971      0.6900      0.7100
PARALLEL = 3
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  220.6904      0.3001      0.2900      0.3100
Scaling   :  213.3339      0.3091      0.3000      0.3200
Summing   :  213.3334      0.4601      0.4500      0.4800
SAXPYing  :  204.2556      0.4874      0.4700      0.5400
PARALLEL = 4
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  278.2614      0.2441      0.2300      0.2600
Scaling   :  278.2614      0.2504      0.2300      0.2700
Summing   :  274.2854      0.3652      0.3500      0.3800
SAXPYing  :  259.4589      0.3841      0.3700      0.4000
PARALLEL = 5
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  320.0018      0.2061      0.2000      0.2200
Scaling   :  336.8412      0.2128      0.1900      0.2500
Summing   :  320.0008      0.3122      0.3000      0.3300
SAXPYing  :  309.6780      0.3191      0.3100      0.3400
PARALLEL = 6
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  376.4662      0.1852      0.1700      0.2000
Scaling   :  376.4704      0.1846      0.1700      0.2100
Summing   :  369.2332      0.2691      0.2600      0.2800
SAXPYing  :  355.5575      0.2976      0.2700      0.4100
PARALLEL = 7
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  426.6678      0.1635      0.1500      0.1900
Scaling   :  426.6678      0.1666      0.1500      0.1900
Summing   :  417.3921      0.2411      0.2300      0.2500
SAXPYing  :  400.0004      0.2451      0.2400      0.2600
PARALLEL = 8
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  492.3036      0.1643      0.1300      0.2800
Scaling   :  492.3073      0.2038      0.1300      0.4300
Summing   :  505.2668      0.2184      0.1900      0.2400
SAXPYing  :  479.9982      0.2247      0.2000      0.2700
--
John D. McCalpin                        mccalpin@perelandra.cms.udel.edu
Assistant Professor                     mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.      John.McCalpin@mvs.udel.edu

From mccalpin@perelandra.cms.udel.edu Sat Feb 26 10:55:03 1994
Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA21276; Tue, 22 Feb 94 00:55:25 -0500
Received: from vnet.IBM.COM (vnet.ibm.com [192.239.48.4]) by bach.udel.edu (8.6.4/8.6.4) with SMTP id AAA08746 for <mccalpin@bach.udel.edu>; Tue, 22 Feb 1994 00:59:32 -0500
From: NIK@KULVM.VNET.IBM.COM
Message-Id: <199402220559.AAA08746@bach.udel.edu>
Received: from KULVM by vnet.IBM.COM (IBM VM SMTP V2R2) with BSMTP id 2551;
   Tue, 22 Feb 94 00:55:55 EST
Date: Tue, 22 Feb 94 13:52:33 MAL
To: mccalpin@bach.udel.edu
Subject: Bandwidth
Status: RO

Dear John,
I have your report on subject memory bandwidth dated May 4, 1993. Since
then IBM and its competition have had new machines added into our product
line. I wonder if you have a revised version of the subject report which
include bandwidth computations for the new machines for IBM,HP and SUN
especially. It would help my customers in deriving their RFP specifications.
Many thanks in advance,

Best regards,
Nik Habibah

From jbs@watson.ibm.com Sun Feb 27 06:08:26 EST 1994
Article: 13502 of comp.benchmarks
Path: news.udel.edu!udel!MathWorks.Com!news.kei.com!uuspew.uu.net!uunet!watson.ibm.com
From: jbs@watson.ibm.com
Message-ID: <19940225.150338.930@almaden.ibm.com>
Date: Fri, 25 Feb 94 17:53:58 EST
Newsgroups: comp.benchmarks,comp.unix.aix
Subject: Streams (and other) Benchmarks for the 590
Lines: 25
Xref: news.udel.edu comp.benchmarks:13502 comp.unix.aix:37452
Status: RO

         Hugh LaMaster asked:
>I understand the 590 is shipping now.  Has anyone run the
>"streams" benchmark on the 590.  Could the results be
>posted please?

         I just ran this.  I used version 2.0 of streamd.f dated
9/30/91.  I used n=3000000 and ntimes=10.  I used version 03.01.0000.
0000 of xlf with compiler options "-c -O3 -qstrict -qsource -qlist
-qarch=pwrx".  The output was:

--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =  69.0000000000000000  hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  600.0000       .0831       .0800       .0900
Scaling   :  533.3333       .1002       .0900       .1100
Summing   :  654.5455       .1161       .1100       .1200
SAXPYing  :  654.5455       .1152       .1100       .1300

                          James B. Shearer
Disclaimer: I work for IBM, however this is not an official result.


From mccalpin@perelandra.cms.udel.edu Fri Mar  4 12:13:31 1994
Received: from timbuk.cray.com by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA03522; Fri, 4 Mar 94 12:11:07 -0500
Received: from magnet (magnet.cray.com) by cray.com (Bob mailer 1.2)
	id AA04432; Fri, 4 Mar 94 11:15:47 CST
Received: by magnet (4.1/CRI-5.13)
	id AA02121; Fri, 4 Mar 94 11:15:44 CST
From: cmg@ferrari.cray.com (Charles Grassl)
Message-Id: <9403041715.AA02121@magnet>
Subject: STREAM on T3D
To: mccalpin (John D. McCalpin)
Date: Fri, 4 Mar 94 11:15:40 CST
In-Reply-To: <Pine.3.07.9310291130.C14700-c100000@perelandra.cms.udel.edu>; from "John D. McCalpin" at Oct 29, 93 11:23 am
X-Mailer: ELM [version 2.3 PL11]
Status: RO

Dear John;

Would you please update the STREAM table on the anonymous ftp archive
with the data below for the CRAY T3D.  

                     Bytes          Bandwidth (MB/s)                   Clock
Machine             /word      Copy     Scale       Sum     Triad       MHz
---------------     -----  --------  --------  --------  --------    ------
CRAY T3D 256 PEs        8   66106.2   51583.5   33324.7   89105.0       150
CRAY T3D 128 PEs        8   32973.0   25793.4   16661.7   44665.0       150
CRAY T3D  64 PEs        8   16520.9   12896.0    8331.3   22338.4       150
CRAY T3D  32 PEs        8    8264.9    6448.3    4165.7   11167.5       150

The actual output from the T3D is listed below.

Also, I noticed in the sc spreadsheet that the C90 clock rate is listed
as 250 MHz.  This is incorrect.  The C90 clock rate is 240 MHz.

Thank you and Regards,
Charles Grassl
Cray Research, Inc.
Eagan, Minnesota


OUTPUT FROM CRAY T3D FOLLOWS:
-----------------------------

 *** STREAM benchmark ***
 Number of PEs:              32
 Number of iterations:        5
 Size of Arrays:         400000

 Test #1 Failed = picalc=piexact
 Apparently Single=Double Precision
 Proceeding to Test #2
  
--------------------------------------
 Single precision appears to have 16 digits of accuracy
 Assuming 8 bytes per default REAL word
--------------------------------------
 Timing calibration ; time = 6.1997264331694169 hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function  Rate (MB/s)    RMS time    Min time    Max time   Rate / PE
Assignment: 8264.9256      0.0248      0.0248      0.0250    258.2789
Scaling   : 6448.3460      0.0319      0.0318      0.0320    201.5108
Summing   : 4165.7771      0.0738      0.0737      0.0740    130.1805
SAXPYing  :11167.5113      0.0276      0.0275      0.0278    348.9847

 *** STREAM benchmark ***
 Number of PEs:              64
 Number of iterations:        5
 Size of Arrays:         400000

 Test #1 Failed = picalc=piexact
 Apparently Single=Double Precision
 Proceeding to Test #2
  
--------------------------------------
 Single precision appears to have 16 digits of accuracy
 Assuming 8 bytes per default REAL word
--------------------------------------
 Timing calibration ; time = 6.1920564294268843 hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function  Rate (MB/s)    RMS time    Min time    Max time   Rate / PE
Assignment:16520.9172      0.0249      0.0248      0.0249    258.1393
Scaling   :12896.0747      0.0318      0.0318      0.0319    201.5012
Summing   : 8331.3155      0.0739      0.0737      0.0740    130.1768
SAXPYing  :22338.4170      0.0277      0.0275      0.0278    349.0378

 *** STREAM benchmark ***
 Number of PEs:             128
 Number of iterations:        5
 Size of Arrays:         400000

 Test #1 Failed = picalc=piexact
 Apparently Single=Double Precision
 Proceeding to Test #2
  
--------------------------------------
 Single precision appears to have 16 digits of accuracy
 Assuming 8 bytes per default REAL word
--------------------------------------
 Timing calibration ; time = 6.1994464331291965 hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function  Rate (MB/s)    RMS time    Min time    Max time   Rate / PE
Assignment:32972.9965      0.0249      0.0248      0.0250    257.6015
Scaling   :25793.4001      0.0318      0.0318      0.0319    201.5109
Summing   :16661.7047      0.0739      0.0737      0.0739    130.1696
SAXPYing  :44665.0442      0.0276      0.0275      0.0277    348.9457

 *** STREAM benchmark ***
 Number of PEs:             256
 Number of iterations:        5
 Size of Arrays:         400000

 Test #1 Failed = picalc=piexact
 Apparently Single=Double Precision
 Proceeding to Test #2

--------------------------------------
 Single precision appears to have 16 digits of accuracy
 Assuming 8 bytes per default REAL word
--------------------------------------
 Timing calibration ; time = 6.2008504338336934 hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1
/100 second
 ---------------------------------------------------
Function  Rate (MB/s)    RMS time    Min time    Max time   Rate / PE
Assignment:66106.2260      0.0249      0.0248      0.0250    258.2274
Scaling   :51583.5518      0.0318      0.0318      0.0320    201.4982
Summing   :33324.7739      0.0739      0.0737      0.0740    130.1749
SAXPYing  :89104.9894      0.0277      0.0276      0.0278    348.0664

From castor@drizzle.Stanford.EDU  Tue Mar 15 15:38:27 1994
Received: from drizzle.Stanford.EDU by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA09361; Tue, 15 Mar 94 15:38:27 -0500
Received: from grizzle.Stanford.EDU (grizzle.Stanford.EDU [36.59.0.80]) by drizzle.Stanford.EDU (8.6.4/8.6.4) with ESMTP id MAA26578 for <mccalpin@perelandra.cms.udel.edu>; Tue, 15 Mar 1994 12:39:00 -0800
From: Castor Fu <castor@drizzle.Stanford.EDU>
Received: from localhost (castor@localhost) by grizzle.Stanford.EDU (8.6.4/8.6.4) id MAA03337 for mccalpin@perelandra.cms.udel.edu; Tue, 15 Mar 1994 12:38:59 -0800
Message-Id: <199403152038.MAA03337@grizzle.Stanford.EDU>
Subject: stream benchmark results
To: mccalpin
Date: Tue, 15 Mar 1994 12:38:58 -0800 (PST)
X-Mailer: ELM [version 2.4 PL2]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Length: 969       
Status: RO

On a DEC 3000/300 running OSF/1 v1.3, using 

DESC='DEC Fortran V3.3 for OSF/1 AXP Systems'

I got the following results:

castor:hassle[59] f77 -O stream_d.f 
castor:hassle[60] a.out
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    136.0544040799141     hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   33.3709      0.3942      0.3836      0.4451
Scaling   :   33.4561      0.3871      0.3826      0.3982
Summing   :   39.5818      0.4938      0.4851      0.5270
SAXPYing  :   38.8778      0.4979      0.4939      0.5183
17.713u 1.503s 0:41.72 46.0% 0+0+0k (0k) 2+0io 0pf+0w

	-castor fu
	castor@drizzle.stanford.edu

From mccalpin@perelandra.cms.udel.edu Wed Apr 20 07:00:07 1994
Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA29086; Tue, 19 Apr 94 16:36:46 -0400
Received: from marvin.cdf.utoronto.ca (root@marvin.cdf.toronto.edu [128.100.31.3]) by bach.udel.edu (8.6.8/8.6.6) with SMTP id QAA16489 for <mccalpin@bach.udel.edu>; Tue, 19 Apr 1994 16:36:44 -0400
Received: by marvin.cdf.toronto.edu id <31303>; Tue, 19 Apr 1994 16:36:39 -0400
From: John DiMarco <jdd@cdf.toronto.edu>
To: mccalpin@bach.udel.edu
Subject: streams benchmarks results 
Message-Id: <94Apr19.163639edt.31303@marvin.cdf.toronto.edu>
Date: 	Tue, 19 Apr 1994 16:36:33 -0400
Status: RO

One minor comment about your table: the Sun 670 numbers are not particularly
helpful, since the 670 is available with several different types of CPU.

I've run your streams benchmark on some of our newer suns with the new 
SunPro 3.0 f77 compiler (no parallelizing). Here are the results.
(code was compiled using f77 -cg92 -O4 -fsimple -dalign -Bstatic)

All in all, the machines were not particularly impressive on this benchmark.

Regards,

John
--
John DiMarco                                              jdd@cdf.toronto.edu
Computing Disciplines Facility Systems Manager            jdd@cdf.utoronto.ca
University of Toronto                                     EA201B,(416)978-1928

-------- cut here --------

128MB SPARCcenter 1000 (model 1102: 2x50MHz SuperSPARCs, 1MB external cache):
	 Assuming 8 bytes per DOUBLEPRECISION word
	Function     Rate (MB/s)  RMS time   Min time  Max time
	Assignment:   37.2453      0.8608      0.8592      0.8654
	Scaling   :   35.2409      0.9095      0.9080      0.9131
	Summing   :   37.9461      1.2661      1.2650      1.2674
	SAXPYing  :   41.2282      1.1661      1.1643      1.1704
	 Assuming 4 bytes per default REAL word
	Function     Rate (MB/s)  RMS time   Min time  Max time
	Assignment:   33.7530      0.1204      0.1185      0.1234
	Scaling   :   30.4827      0.1334      0.1312      0.1384
	Summing   :   34.2466      0.1778      0.1752      0.1851
	SAXPYing  :   30.6143      0.1983      0.1960      0.2022

64MB SPARCstation 10 model 402 (2x40MHz SuperSPARCs, no external cache)
	 Assuming 8 bytes per DOUBLEPRECISION word
	Function     Rate (MB/s)  RMS time   Min time  Max time
	Assignment:   53.8878      0.6701      0.5938      1.1235
	Scaling   :   50.3085      0.6411      0.6361      0.6475
	Summing   :   53.1078      0.9930      0.9038      1.5348
	SAXPYing  :   49.4348      0.9809      0.9710      1.0099
	 Assuming 4 bytes per default REAL word
	Function     Rate (MB/s)  RMS time   Min time  Max time
	Assignment:   41.7293      0.0965      0.0959      0.0980
	Scaling   :   35.4505      0.1134      0.1128      0.1139
	Summing   :   42.6789      0.1413      0.1406      0.1426
	SAXPYing  :   35.4308      0.1701      0.1693      0.1704

128MB Sun SPARCstation 10 model 514 (4x50MHz SuperSPARCs, 1MB external cache)
	 Assuming 8 bytes per DOUBLEPRECISION word
	Function     Rate (MB/s)  RMS time   Min time  Max time
	Assignment:   44.0597      0.7290      0.7263      0.7343
	Scaling   :   40.4820      0.7946      0.7905      0.7993
	Summing   :   45.6326      1.0608      1.0519      1.0724
	SAXPYing  :   44.3635      1.0892      1.0820      1.0943
	 Assuming 4 bytes per default REAL word
	Function     Rate (MB/s)  RMS time   Min time  Max time
	Assignment:   37.8953      0.1062      0.1056      0.1075
	Scaling   :   32.6593      0.1230      0.1225      0.1238
	Summing   :   39.7357      0.1518      0.1510      0.1545
	SAXPYing  :   30.1364      0.1998      0.1991      0.2006

From mccalpin@perelandra.cms.udel.edu Fri May  6 15:16:12 1994
Received: from sgigate.SGI.COM by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA24449; Fri, 6 May 94 15:02:39 -0400
Received: from relay.sgi.com (relay.sgi.com [192.26.51.36]) by sgigate.sgi.com (8.6.4/8.6.4) with SMTP id MAA12523; Fri, 6 May 1994 12:03:28 -0700
Received: from phil.trevose.sgi.com by relay.sgi.com via SMTP (920330.SGI/920502.SGI)
	for @sgigate.sgi.com:mccalpin@perelandra.cms.udel.edu id AA09179; Fri, 6 May 94 12:03:22 -0700
Received: from manic.trevose.sgi.com by phil.trevose.sgi.com via SMTP (931110.SGI/930416.SGI)
	for @relay.sgi.com:mccalpin@perelandra.cms.udel.edu id AA16243; Fri, 6 May 94 15:08:22 -0400
Received: by manic.trevose.sgi.com (931110.SGI/930416.SGI)
	for @phil.trevose.sgi.com:mccalpin@perelandra.cms.udel.edu id AA18121; Fri, 6 May 94 15:08:20 -0400
From: "Rusty Benson" <benson@manic.trevose.sgi.com>
Message-Id: <9405061508.ZM18119@manic.trevose.sgi.com>
Date: Fri, 6 May 1994 15:08:20 -0400
X-Mailer: Z-Mail-SGI (3.0S.1026 26oct93 MediaMail)
To: mccalpin
Subject: stream benchmark numbers
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0
Status: RO

John,

Here are the numbers I obtained for the memory bandwith test.  The
machine for the single job run is configured as follows:

4 150 MHZ IP19 Processors
CPU: MIPS R4400 Processor Chip Revision: 5.0
FPU: MIPS R4010 Floating Point Chip Revision: 0.0
Data cache size: 16 Kbytes
Instruction cache size: 16 Kbytes
Secondary unified instruction/data cache size: 1 Mbyte
Main memory size: 128 Mbytes, 2-way interleaved
I/O board, Ebus slot 3: IO4 revision 1
Integral EPC serial ports: 4
RealityEngineII Graphics Pipe 0 at IO Slot 3 Physical Adapter 2 (Fchip
rev 2)
Integral Ethernet controller: et0, Ebus slot 3
Integral SCSI controller 4: Version WD33C95A
Integral SCSI controller 3: Version WD33C95A
Integral SCSI controller 2: Version WD33C95A
Integral SCSI controller 1: Version WD33C95A
Disk drive: unit 5 on SCSI controller 1
Disk drive: unit 4 on SCSI controller 1
Disk drive: unit 3 on SCSI controller 1
Disk drive: unit 2 on SCSI controller 1
Disk drive: unit 1 on SCSI controller 1
Integral SCSI controller 0: Version WD33C95A
Tape drive: unit 5 on SCSI controller 0: DAT
Disk drive: unit 1 on SCSI controller 0
Integral EPC parallel port: Ebus slot 3
VME bus: adapter 0 mapped to adapter 13
VME bus: adapter 13

The compiler options used are:  -O3 -mips2 -non_shared -sopt

************************1 proc*************************************
The results are as follows:

re 20> cat P1.out
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           11 hundredths of a second
 Wall Time ; time =    106.9167     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   67.5474      0.3085      0.3079      0.3097
Scaling   :   68.1404      0.3058      0.3053      0.3063
Summing   :   68.5743      0.4554      0.4550      0.4560
SAXPYing  :   68.5138      0.4557      0.4554      0.4559

***********************4 procs****************************************
The following results for "Parallel_jobs" were obtained on a similarly
configured machine with 1024 MB of main memory.


re 21> cat P4.out
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           14 hundredths of a second
 Wall Time ; time =    147.5833     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   64.7151      0.3225      0.3214      0.3235
Scaling   :   62.4518      0.3337      0.3331      0.3345
Summing   :   65.2987      0.4791      0.4778      0.4800
SAXPYing  :   63.8196      0.4947      0.4889      0.4964
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           12 hundredths of a second
 Wall Time ; time =    148.2500     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   65.7558      0.3176      0.3163      0.3195
Scaling   :   63.3231      0.3294      0.3285      0.3307
Summing   :   66.9963      0.4671      0.4657      0.4680
SAXPYing  :   64.8561      0.4841      0.4811      0.4901
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           13 hundredths of a second
 Wall Time ; time =    146.1667     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   66.7203      0.3122      0.3117      0.3127
Scaling   :   64.0799      0.3250      0.3246      0.3253
Summing   :   67.6132      0.4617      0.4614      0.4622
SAXPYing  :   65.4464      0.4771      0.4767      0.4777
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           12 hundredths of a second
 Wall Time ; time =    141.1667     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   66.5551      0.3133      0.3125      0.3139
Scaling   :   63.9300      0.3261      0.3254      0.3267
Summing   :   67.3581      0.4635      0.4632      0.4639
SAXPYing  :   65.2433      0.4788      0.4782      0.4796


**********************8 procs*****************************************
re 22> cat P8.out
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           12 hundredths of a second
 Wall Time ; time =    230.5000     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   58.1825      0.3622      0.3575      0.3762
Scaling   :   58.6926      0.3595      0.3544      0.3703
Summing   :   61.6608      0.5163      0.5060      0.5195
SAXPYing  :   62.0106      0.5260      0.5031      0.5331
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           13 hundredths of a second
 Wall Time ; time =    233.5833     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   58.7745      0.3608      0.3539      0.3628
Scaling   :   58.9172      0.3573      0.3530      0.3608
Summing   :   61.4912      0.5163      0.5074      0.5193
SAXPYing  :   61.7547      0.5253      0.5052      0.5296
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           13 hundredths of a second
 Wall Time ; time =    245.3333     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   59.6023      0.3977      0.3490      0.4125
Scaling   :   62.2259      0.4016      0.3343      0.4285
Summing   :   66.5438      0.5453      0.4689      0.5584
SAXPYing  :   65.0047      0.5498      0.4800      0.5697
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           12 hundredths of a second
 Wall Time ; time =    240.2500     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   58.5795      0.3986      0.3551      0.4150
Scaling   :   60.1227      0.3960      0.3460      0.4167
Summing   :   65.9272      0.5413      0.4732      0.5524
SAXPYing  :   64.7308      0.5483      0.4820      0.5612
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           15 hundredths of a second
 Wall Time ; time =    251.4167     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   58.9810      0.3637      0.3527      0.3674
Scaling   :   58.3560      0.3593      0.3564      0.3610
Summing   :   60.9280      0.5209      0.5121      0.5242
SAXPYing  :   61.3016      0.5281      0.5090      0.5332
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           12 hundredths of a second
 Wall Time ; time =    233.6667     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   59.2954      0.4080      0.3508      0.4291
Scaling   :   59.2147      0.3975      0.3513      0.4126
Summing   :   65.2718      0.5471      0.4780      0.5628
SAXPYing  :   64.5495      0.5475      0.4833      0.5602
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           12 hundredths of a second
 Wall Time ; time =    239.1667     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   61.3640      0.3614      0.3390      0.3661
Scaling   :   59.2903      0.3587      0.3508      0.3616
Summing   :   59.8287      0.5247      0.5215      0.5270
SAXPYing  :   60.7323      0.5264      0.5137      0.5291
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =           13 hundredths of a second
 Wall Time ; time =    197.3333     hundredths of a second
 Increase the size of the arrays if this is <100
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   64.1660      0.3605      0.3242      0.3677
Scaling   :   62.0536      0.3568      0.3352      0.3656
Summing   :   61.5227      0.5227      0.5071      0.5262
SAXPYing  :   59.0152      0.5316      0.5287      0.5343


Rusty

-- 
-----------------------------------------------------------------------
 Rusty Benson - Systems Engineer                         benson@sgi.com
 Silicon Graphics Computer Systems
 8 Neshaminy Interplex, Suite 115
 Trevose PA  19053                                       (215) 638-3707
-----------------------------------------------------------------------
Somewhere in a Royal Hotel room there's a guy who's starting to realize
that eternal fate has turned its back on him ... It's 2AM. 
                                                         Golden Earring
-----------------------------------------------------------------------


From mccalpin@perelandra.cms.udel.edu Fri May  6 16:50:08 1994
Received: from sgigate.SGI.COM by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA24818; Fri, 6 May 94 15:45:16 -0400
Received: from relay.sgi.com (relay.sgi.com [192.26.51.36]) by sgigate.sgi.com (8.6.4/8.6.4) with SMTP id MAA15067; Fri, 6 May 1994 12:46:05 -0700
Received: from phil.trevose.sgi.com by relay.sgi.com via SMTP (920330.SGI/920502.SGI)
	for @sgigate.sgi.com:mccalpin@perelandra.cms.udel.edu id AA10791; Fri, 6 May 94 12:46:00 -0700
Received: from manic.trevose.sgi.com by phil.trevose.sgi.com via SMTP (931110.SGI/930416.SGI)
	for @relay.sgi.com:mccalpin@perelandra.cms.udel.edu id AA16422; Fri, 6 May 94 15:51:02 -0400
Received: by manic.trevose.sgi.com (931110.SGI/930416.SGI)
	for @phil.trevose.sgi.com:mccalpin@perelandra.cms.udel.edu id AA18213; Fri, 6 May 94 15:51:01 -0400
From: "Rusty Benson" <benson@manic.trevose.sgi.com>
Message-Id: <9405061551.ZM18211@manic.trevose.sgi.com>
Date: Fri, 6 May 1994 15:51:00 -0400
In-Reply-To: "John D. McCalpin" <mccalpin@perelandra.cms.udel.edu>
        "Re: stream benchmark numbers" (May  6,  3:28pm)
References: <Pine.3.89.9405061551.D24138-0100000@perelandra.cms.udel.edu>
X-Mailer: Z-Mail-SGI (3.0S.1026 26oct93 MediaMail)
To: "John D. McCalpin" <mccalpin>
Subject: Re: stream benchmark numbers
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0
Status: RO

John,

I forgot to mention one thing, the cases that were > 1 cpu were run on
a challenge XL with 32 processors and 1 GB of memory.  The machine also
was not dedicated, there was a GAUSSIAN and a CHARMM job running.

I will send the results of the qgbox stuff sometime next week (if we're
lucky maybe this weekend).  At that time, I will submit the paperwork
for the runs on TFP.

Rusty

-- 
-----------------------------------------------------------------------
 Rusty Benson - Systems Engineer                         benson@sgi.com
 Silicon Graphics Computer Systems
 8 Neshaminy Interplex, Suite 115
 Trevose PA  19053                                       (215) 638-3707
-----------------------------------------------------------------------
Somewhere in a Royal Hotel room there's a guy who's starting to realize
that eternal fate has turned its back on him ... It's 2AM. 
                                                         Golden Earring
-----------------------------------------------------------------------


From mccalpin@perelandra.cms.udel.edu Wed May 25 09:13:47 1994
Received: from timbuk.cray.com by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA12664; Tue, 24 May 94 14:30:43 -0400
Received: from lotus04.benchmark (lotus04.cray.com) by cray.com (Bob mailer 1.2)
	id AA04374; Tue, 24 May 94 13:32:22 CDT
Received: by lotus04.benchmark (5.0/SMI-SVR4)
	id AA09648; Tue, 24 May 1994 13:31:43 +0600
From: cmg@lotus04.cray.com (Charles Grassl)
Message-Id: <9405241831.AA09648@lotus04.benchmark>
Subject: CRAY T3D
To: mccalpin (John D. McCalpin)
Date: Tue, 24 May 1994 13:31:42 -0500 (CDT)
In-Reply-To: <Pine.3.07.9310291150.D14700-a100000@perelandra.cms.udel.edu> from "John D. McCalpin" at Oct 29, 93 11:35:51 am
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 713       
Status: RO

John;

The STREAM benchmark SAXPY result for the CRAY T3D was generated with
the library version of the LINPACK SAXPY.  Is this OK as far as
comparison with other systems?  Do you specifiy that libraries not be
used?

Regards,
-- 
Charles Grassl
Cray Research, Inc.
cmg@cray.com
(612) 683-3531 

P.S.  I am now a father, too.  You have mentioned your child (daughter?)
several times and I know how proud you are, so I thought you might be
interested.   My little daughter came as too much of a surprise:  she
is three months premature.  We have our problems cut out for us, but she
is in a very good childrens hospital in Minneapolis.  We hope that she
will be alright and can come home in two or three months.


From David.Daniel@meiko.com  Mon Jun  6 15:54:18 1994
Received: from marge.meiko.com by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA00627; Mon, 6 Jun 94 15:54:18 -0400
Received: from abe76.meiko.com (abe.meiko.com) by marge.meiko.com with SMTP id AA07422
  (5.65c/IDA-1.4.4 for <mccalpin@perelandra.cms.udel.edu>); Mon, 6 Jun 1994 15:56:35 -0400
Date: Mon, 6 Jun 1994 15:56:35 -0400
From: David Daniel <David.Daniel@meiko.com>
Message-Id: <199406061956.AA07422@marge.meiko.com>
Received: by abe76.meiko.com (5.0/SMI-SVR4)
	id AA07631; Mon, 6 Jun 94 15:59:10 EDT
To: mccalpin
Subject: memory bandwith results
Cc: marc@access.digex.net, Charles.Wilson@meiko.com
Content-Length: 2791
Status: RO
X-Status: 

John,

Attached below are the  results of your  stream memory bandwidth tests
for the CS-2 vector node.  Let me know if you  need more info on this,
and I'll be in touch again with the qgbench results shortly.

David Daniel

----------------------------------------------------------------------
Meiko					Email:	David.Daniel@meiko.com
130C Baker Avenue Ext.			Tel:	508-371-0088
Concord, MA 01742, USA			Fax:	508-371-7516
----------------------------------------------------------------------


Hardware
********
MK403 compute board incorporating:
66 MHz Pinnacle SPARC.
45 MHz MK534 vector unit (2 VPUs + cache coherency hardware).
16 bank, 3 port, 128MB memory system.

The VPUs  can request  1 double precision    word per cycle,   so peak
theoretical memory bandwidth in vector code is

   2 * 45  MHz * 8 bytes = 720 MB/s.

We achieve 85% of  this for  your  benchmark.  This is typical  of the
code generated by  the  vectorizing compilers   -- about 15%   goes to
stripmine overhead.


Software
********
Solaris 2.1 + Meiko support for VPUs.
Portland Group vectorizing f77 and cc compilers.


Changes to code
***************
I  replaced   the second   function  with  a    call  to  the  Solaris
gettimeofday,  which I believe  gives  sufficient resolution. No other
changes were made.

========================================================================
$ cat second.c
#include <sys/time.h>
 
double   second_ (void)
{
    static struct timeval tp;
    gettimeofday (&tp);
    return (double) tp.tv_sec + (double) tp.tv_usec * 1.0e-6;
}
========================================================================


Log of session
**************
========================================================================

$ date
Mon Jun  6 13:03:55 EDT 1994
$ uname -a 
SunOS abe1 5.1 Callisto_Development dino1 sparc
$ cd ~/Delaware/stream
$ make 
pgcc -I /opt/MEIKOcs2/include -c  second.c
pgf77 -O4 -r8 -Mvect  -o stream_d stream_d.f second.o
Linking:
$ size stream_d
149439 + 67580 + 48005400 = 48222419
$ stream_d
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    20.64710855484009      hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  621.2748      0.0516      0.0515      0.0516
Scaling   :  621.3108      0.0517      0.0515      0.0535
Summing   :  610.2757      0.0787      0.0787      0.0788
SAXPYing  :  610.3460      0.0787      0.0786      0.0787

========================================================================

From mccalpin@axposf.pa.dec.com  Fri Jun 17 07:57:39 1994
Received: from axposf.pa.dec.com by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA27386; Fri, 17 Jun 94 07:57:39 -0400
Received: by axposf.pa.dec.com; (5.65/1.1.8.2/21Apr94-0333PM)
	id AA06121; Fri, 17 Jun 1994 05:00:17 -0700
Date: Fri, 17 Jun 1994 05:00:17 -0700
From: John D. McCalpin <mccalpin@axposf.pa.dec.com>
Message-Id: <9406171200.AA06121@axposf.pa.dec.com>
To: mccalpin
Subject: DEC 4000/710 stream_d
Status: RO

axposf.pa.dec.com> f77 -O stream_d.f -o stream_d
axposf.pa.dec.com> uptime ; stream_d ; uptime
04:58  up 15 days,  9:33,  13 users,  load average: 1.00, 1.03, 1.07
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    130.784004181623      hundredths of a second
 Increase the size of the arrays if this is <30
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   76.4263      0.4230      0.4187      0.4480
Scaling   :   74.6854      0.4323      0.4285      0.4363
Summing   :   72.7520      0.6649      0.6598      0.6715
SAXPYing  :   69.3657      0.6948      0.6920      0.6969
04:59  up 15 days,  9:33,  13 users,  load average: 2.00, 1.80, 1.57

Used Fortran etime() for timing.

From mccalpin@malacandra  Sun Jun 19 08:59:30 1994
Received: from malacandra.cms.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA01072; Sun, 19 Jun 94 08:59:30 -0400
Received: by malacandra.cms.udel.edu (AIX 3.2/UCB 5.64/4.03)
          id AA16072; Sun, 19 Jun 1994 08:58:51 -0400
Date: Sun, 19 Jun 1994 08:58:51 -0400
From: mccalpin@malacandra
Message-Id: <9406191258.AA16072@malacandra.cms.udel.edu>
To: mccalpin
Subject: Stream on RS/6000-250
Status: RO

stream_d
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =   55.0000011920928955      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   71.1112       .1871       .1800       .1900
Scaling   :   67.3686       .1910       .1900       .2000
Summing   :   71.1111       .2700       .2700       .2700
SAXPYing  :   71.1112       .2710       .2700       .2800
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =   55.9999993070960045      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   71.1112       .1880       .1800       .1900
Scaling   :   67.3686       .1910       .1900       .2000
Summing   :   73.8461       .2690       .2600       .2700
SAXPYing  :   73.8462       .2690       .2600       .2700
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =   55.0000002607703209      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   67.3686       .1900       .1900       .1900
Scaling   :   67.3686       .1900       .1900       .1900
Summing   :   73.8461       .2690       .2600       .2700
SAXPYing  :   73.8462       .2680       .2600       .2700
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =   56.0000002384185791      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   71.1112       .1871       .1800       .1900
Scaling   :   67.3686       .1920       .1900       .2000
Summing   :   73.8462       .2691       .2600       .2800
SAXPYing  :   71.1112       .2700       .2700       .2700
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =   55.0000011920928955      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   71.1112       .1890       .1800       .1900
Scaling   :   71.1111       .1880       .1800       .1900
Summing   :   73.8462       .2690       .2600       .2700
SAXPYing  :   73.8462       .2680       .2600       .2700
stream_s
--------------------------------------
 Single precision appears to have  7 digits of accuracy
 Assuming 4 bytes per default REAL word
--------------------------------------
 Timing calibration ; time =   67.00000000      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   58.1820       .2200       .2200       .2200
Scaling   :   51.2000       .2681       .2500       .2700
Summing   :   53.3333       .3690       .3600       .3700
SAXPYing  :   53.3333       .3690       .3600       .3700
--------------------------------------
 Single precision appears to have  7 digits of accuracy
 Assuming 4 bytes per default REAL word
--------------------------------------
 Timing calibration ; time =   67.00000000      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   58.1820       .2200       .2200       .2200
Scaling   :   49.2308       .2650       .2600       .2700
Summing   :   51.8919       .3720       .3700       .3800
SAXPYing  :   51.8919       .3730       .3700       .3800
--------------------------------------
 Single precision appears to have  7 digits of accuracy
 Assuming 4 bytes per default REAL word
--------------------------------------
 Timing calibration ; time =   67.00000000      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   58.1818       .2270       .2200       .2300
Scaling   :   51.2000       .2590       .2500       .2600
Summing   :   51.8919       .3700       .3700       .3700
SAXPYing  :   51.8919       .3720       .3700       .3800
--------------------------------------
 Single precision appears to have  7 digits of accuracy
 Assuming 4 bytes per default REAL word
--------------------------------------
 Timing calibration ; time =   67.00000000      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   58.1820       .2280       .2200       .2300
Scaling   :   49.2308       .2610       .2600       .2700
Summing   :   53.3332       .3690       .3600       .3700
SAXPYing  :   53.3334       .3680       .3600       .3700
--------------------------------------
 Single precision appears to have  7 digits of accuracy
 Assuming 4 bytes per default REAL word
--------------------------------------
 Timing calibration ; time =   67.00000000      hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   58.1820       .2200       .2200       .2200
Scaling   :   49.2307       .2690       .2600       .2700
Summing   :   51.8919       .3700       .3700       .3700
SAXPYing  :   51.8919       .3700       .3700       .3700
stream_wall
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   56.0000002384185791      hundredths of a second
 Wall Time ; time =   55.7306051254272461      hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   68.1130       .1884       .1879       .1897
Scaling   :   67.2633       .1907       .1903       .1911
Summing   :   71.3460       .2697       .2691       .2700
SAXPYing  :   71.4296       .2694       .2688       .2702
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   56.0000002384185791      hundredths of a second
 Wall Time ; time =   55.5277109146118164      hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   68.1642       .1881       .1878       .1886
Scaling   :   67.2792       .1907       .1903       .1912
Summing   :   71.3667       .2712       .2690       .2853
SAXPYing  :   71.4054       .2695       .2689       .2702
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   56.0000002384185791      hundredths of a second
 Wall Time ; time =   55.4084062576293945      hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   68.1656       .1883       .1878       .1889
Scaling   :   67.2597       .1906       .1903       .1911
Summing   :   71.3381       .2697       .2691       .2705
SAXPYing  :   71.4349       .2693       .2688       .2698
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   56.9999992847442627      hundredths of a second
 Wall Time ; time =   56.4731001853942871      hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   68.1587       .1884       .1878       .1890
Scaling   :   67.2894       .1906       .1902       .1913
Summing   :   71.3312       .2700       .2692       .2717
SAXPYing  :   71.4063       .2695       .2689       .2703
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   56.0000002384185791      hundredths of a second
 Wall Time ; time =   55.5517911911010742      hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   68.2627       .1882       .1875       .1888
Scaling   :   67.2597       .1909       .1903       .1927
Summing   :   71.3081       .2730       .2693       .3031
SAXPYing  :   71.3938       .2695       .2689       .2700

From mccalpin@mozart.udel.edu  Thu Jun 30 08:58:09 1994
Received: from mozart.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA22570; Thu, 30 Jun 94 08:58:09 -0400
Received: by mozart.udel.edu (AIX 3.2/UCB 5.64/4.03)
          id AA16762; Thu, 30 Jun 1994 08:54:24 -0400
Date: Thu, 30 Jun 1994 08:54:24 -0400
From: mccalpin@mozart.udel.edu (John McCalpin)
Message-Id: <9406301254.AA16762@mozart.udel.edu>
To: mccalpin
Subject: stream on 990
Status: RO

--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =  81.0000002384185791  hundredths of a second
 Wall Time ; time =  81.1749935150146484  hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:  652.6021       .0992       .0981       .1033
Scaling   :  532.5524       .1208       .1202       .1225
Summing   :  710.0853       .1361       .1352       .1376
SAXPYing  :  709.0419       .1361       .1354       .1371
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =  77.9999971389770508  hundredths of a second
 Wall Time ; time =  78.6354064941406250  hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:  662.6702       .0982       .0966       .1018
Scaling   :  533.1336       .1218       .1200       .1296
Summing   :  714.5461       .1364       .1344       .1484
SAXPYing  :  713.8241       .1377       .1345       .1495
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =  79.0000021457672119  hundredths of a second
 Wall Time ; time =  78.3833980560302734  hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:  663.4457       .0998       .0965       .1041
Scaling   :  533.3911       .1231       .1200       .1252
Summing   :  711.7856       .1386       .1349       .1421
SAXPYing  :  712.7482       .1395       .1347       .1427
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =  79.0000012144446373  hundredths of a second
 Wall Time ; time =  80.7206034660339355  hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:  654.3703       .0991       .0978       .1005
Scaling   :  524.3792       .1237       .1220       .1255
Summing   :  709.5923       .1379       .1353       .1451
SAXPYing  :  709.2823       .1381       .1353       .1420
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =  79.0000021457672119  hundredths of a second
 Wall Time ; time =  78.2535076141357422  hundredths of a second
 Increase the size of the arrays if this is <100  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:  661.0343       .2489       .0968       .6950
Scaling   :  532.8537       .1647       .1201       .2477
Summing   :  711.4328       .2343       .1349       .4728
SAXPYing  :  712.0921       .1912       .1348       .2992

From cherkus@UniMaster.COM  Fri Jul 15 22:48:44 1994
Received: from fastball.UniMaster.COM by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA03309; Fri, 15 Jul 94 22:48:44 -0400
Received: (cherkus@localhost) by fastball.unimaster.com (8.6.7/8.5) id WAA03915; Fri, 15 Jul 1994 22:47:47 -0400
From: Dave Cherkus <cherkus@UniMaster.COM>
Message-Id: <199407160247.WAA03915@fastball.unimaster.com>
Subject: STREAM results (C versions...)
To: mccalpin (John D. McCalpin)
Date: Fri, 15 Jul 1994 22:47:42 -0400 (EDT)
In-Reply-To: <9407160140.AA03205@perelandra.cms.udel.edu> from "John D. McCalpin" at Jul 15, 94 09:40:38 pm
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3580      
Status: RO

I was interested in seeing what the difference in memory performance
between PCs and workstations were, so I ran your benchmark on a
Dell 486DX2/66 running SVR4 Unix.  I ran each 3 times and the
output is below.

[necsrv] (cherkus): ./cstream
Timing calibration ; time = 2550.00 usec.
Increase the size of the arrays if this is < 300
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate   (MB/s)     RMS time    Min time    Max time
Assignment:     27.935     642.729     600.000     700.000
Scaling   :     20.951     833.217     800.000     870.000
Summing   :     23.944    1119.821    1050.000    1260.000
SAXPYing  :     21.306    1223.491    1180.000    1310.000
(/local/home/cherkus/bench)
[necsrv] (cherkus): ./cstream
Timing calibration ; time = 2470.00 usec.
Increase the size of the arrays if this is < 300
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate   (MB/s)     RMS time    Min time    Max time
Assignment:     27.034     638.326     620.000     680.000
Scaling   :     20.440     836.062     820.000     850.000
Summing   :     23.718    1083.157    1060.000    1130.000
SAXPYing  :     21.488    1190.109    1170.000    1220.000
(/local/home/cherkus/bench)
[necsrv] (cherkus): ./cstream
Timing calibration ; time = 2590.00 usec.
Increase the size of the arrays if this is < 300
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate   (MB/s)     RMS time    Min time    Max time
Assignment:     27.477     624.356     610.000     670.000
Scaling   :     20.440     846.298     820.000     890.000
Summing   :     23.718    1106.644    1060.000    1190.000
SAXPYing  :     21.488    1193.168    1170.000    1230.000
(/local/home/cherkus/bench)
[necsrv] (cherkus): ./stream_d
Timing calibration ; time = 3510000.000000 usec.
Increase the size of the arrays if this is < 300000
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate   (MB/s)   RMS time     Min time     Max time
Assignment:     33.333  1236499.125   480000.000  1610000.000
Scaling   :     15.385  1500983.000  1040000.000  2110000.000
Summing   :     22.018  1441853.000  1090000.000  2190000.000
SAXPYing  :     18.321  1753890.000  1310000.000  2410000.000
(/local/home/cherkus/bench)
[necsrv] (cherkus): ./stream_d
Timing calibration ; time = 3630000.000000 usec.
Increase the size of the arrays if this is < 300000
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate   (MB/s)   RMS time     Min time     Max time
Assignment:     33.333  1070373.750   480000.000  1650000.000
Scaling   :     15.534  1195926.375  1030000.000  1950000.000
Summing   :     21.818  1453344.500  1100000.000  2230000.000
SAXPYing  :     18.750  1945517.875  1280000.000  2450000.000
(/local/home/cherkus/bench)
[necsrv] (cherkus): ./stream_d
Timing calibration ; time = 2660000.000000 usec.
Increase the size of the arrays if this is < 300000
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate   (MB/s)   RMS time     Min time     Max time
Assignment:     33.333  1437685.625   480000.000  1550000.000
Scaling   :     16.495  1559054.250   970000.000  1980000.000
Summing   :     21.818  1438527.125  1100000.000  2230000.000
SAXPYing  :     18.750  1483600.375  1280000.000  2350000.000

-- 
Dave Cherkus          UniMaster, Inc.         cherkus@unimaster.com 

From robertl@cwi.nl  Tue Jul 19 09:09:08 1994
Received: from [192.16.184.142] by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA13453; Tue, 19 Jul 94 09:09:08 -0400
Received: from paling.cwi.nl by charon.cwi.nl with SMTP
	id <AA19696@cwi.nl>; Tue, 19 Jul 1994 15:07:53 +0200
Received: by paling.cwi.nl with SMTP
	id AA03173 (931110.SGI/3.8/CWI-Amsterdam); Tue, 19 Jul 94 15:07:52 +0200
Message-Id: <9407191307.AA03173=robertl@paling.cwi.nl>
To: mccalpin
Cc: robertl@cwi.nl
Subject: streams on Challange
Date: Tue, 19 Jul 1994 15:07:51 +0200
From: Robert van Liere <Robert.van.Liere@cwi.nl>
Status: RO


John,

[For what its worth...]

Here are some results of your streams program on a multi-processor
Challange @ 150 Mhz running IRIX 5.2.

Included are three cases:
   - sequential code
   - parallel code -- compiled with -pfa, the POWER FORTRAN accelerator.
	-- on a 2 CPU Challange
	-- on a 6 CPU Challange

The -pfa does all sorts of things like loop unrolling, some data
dependency analysis, etc. (you probably know more abt this than i).
I have the impression that (some) improvement can be made by
finding better compiler options.

Cheers,
-- Robert van Liere

Included for each of three cases is:
	- compiler options
	- uname -a
	- hardware inventory | grep 150
	- a.out output
1. --------------------------------------------------------------------
; f77 -mips2 -O3 -non_shared  stream_d.f
; uname -a
IRIX artemis 5.2 02282015 IP19 mips
; a.out 
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    159.0000024065375     hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   62.7451      0.5231      0.5100      0.5400
Scaling   :   58.1819      0.5520      0.5500      0.5600
Summing   :   63.1580      0.7700      0.7600      0.7800
SAXPYing  :   70.5882      0.6920      0.6800      0.7000


2. --------------------------------------------------------------------
; f77 -pfa -mips2 -O3 -non_shared  stream_d.f
; hinv | grep 150
2 150 MHZ IP19 Processors
; uname -a
IRIX artemis 5.2 02282015 IP19 mips
; a.out 
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    178.0000081285834     hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  133.3337      0.2461      0.2400      0.2600
Scaling   :  139.1304      0.2441      0.2300      0.2500
Summing   :  133.3338      0.3721      0.3600      0.3800
SAXPYing  :  141.1764      0.3551      0.3400      0.3600

3. --------------------------------------------------------------------
; f77 -pfa -mips2 -O3 -non_shared  stream_d.f
; hinv | grep 150
6 150 MHZ IP19 Processors
; uname -a
IRIX zeus 5.2 02282015 IP19 mips
; a.out 
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    80.99999930709600     hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  355.5568      0.1018      0.0900      0.1300
Scaling   :  355.5559      0.1015      0.0900      0.1200
Summing   :  369.2308      0.1444      0.1300      0.1600
SAXPYing  :  369.2304      0.1467      0.1300      0.1700


--------------------------------------------------
Robert van Liere
robertl@cwi.nl 
<http://www.cwi.nl/cwi/people/Robert.van.Liere.html>

From mccalpin@perelandra.cms.udel.edu Thu Jul 21 08:33:39 1994
Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA18752; Thu, 21 Jul 94 07:46:30 -0400
Received: from inet-gw-2.pa.dec.com (inet-gw-2.pa.dec.com [16.1.0.23]) by bach.udel.edu (8.6.8/8.6.6) with SMTP id HAA00174 for <mccalpin@bach.udel.edu>; Thu, 21 Jul 1994 07:46:55 -0400
Received: from us1rmc.bb.dec.com by inet-gw-2.pa.dec.com (5.65/27May94)
	id AA17153; Thu, 21 Jul 94 02:52:25 -0700
Received: from i4get.enet by us1rmc.bb.dec.com (5.65/rmc-22feb94)
	id AA29681; Thu, 21 Jul 94 05:52:21 -0400
Message-Id: <9407210952.AA29681@us1rmc.bb.dec.com>
Received: from i4get.enet; by us1rmc.enet; Thu, 21 Jul 94 05:52:22 EDT
Date: Thu, 21 Jul 94 05:52:22 EDT
From: "John, dtn 293-5506  21-Jul-1994 0552" <henning@i4get.enet.dec.com>
To: mccalpin@bach.udel.edu
Cc: me@i4get.enet.dec.com
Apparently-To: mccalpin@bach.udel.edu
Subject: Questions on STREAM Benchmark table
Status: RO

Dear Mr. McCalpin,

Thank you for the memory information that you have published -- it is
very interesting and educational.  I have been looking at the copy of
your STREAM table dated Wed May 25 09:42:20 1994 and have two questions:

   1) Is there a more recent version?  I looked on comp.benchmarks, but
      didn't see one.  If there is, an e-mail copy would be very kind.

   2) Perhaps the copy I have is missing some information; but when I 
      look at the table with "speedup ratios", it doesn't always seem
      to map back to the raw data in the first table.  For Cray C90
      there are speedups reported for 1,2,4,8,16 CPUs, and sure enough
      in the first table I can see the raw data for all of these.
      But for the SGI Challenge 150 Mhz there is a report of 1,4,8 CPUs
      and I can't see the corresponding raw data.  I suppose I could 
      try to derive it, for example by multiplying the 8-CPU "SGI Challenge
      150MHz" Copy speedup of 7.10 by the Copy bandwidth in the base table,
      but that's not easy since I'm not sure which base table entry to
      select:

               SGI Challenge 150 Mhz  8  58.2
               SGI Challenge 150 Mhz  4  66.7
               SGI Challenge 1 cpu    8  57.1
               SGI Challenge 1 cpu    4  53.3

      The Sun SparcCenter 2000 likewise has more CPU entries in the
      second table than in the first.


Thanks very much for your help

   /John Henning
    CSG Performance Group
    Digital
    henning@msbcs.enet.dec.com

From mccalpin@perelandra.cms.udel.edu Thu Jul 21 08:34:23 1994
Received: from strauss.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP)
	for mccalpin id AA18852; Thu, 21 Jul 94 08:18:58 -0400
Received: (from mccalpin@localhost) by strauss.udel.edu (8.6.8/8.6.6) id IAA06830 for mccalpin@perelandra.cms.udel.edu; Thu, 21 Jul 1994 08:19:29 -0400
Date: Thu, 21 Jul 1994 08:19:29 -0400
From: John D McCalpin <mccalpin@strauss.udel.edu>
Message-Id: <199407211219.IAA06830@strauss.udel.edu>
To: mccalpin
Subject: Stream on SC/2000 - raw data
Status: RO

Raw results for a single copy (P1.1):
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   569 hundredths of a second
 Wall Time ; time =     570.333 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   33.3704      1.0088      0.9589      1.3007
Scaling   :   35.2505      0.9369      0.9078      0.9763
Summing   :   36.2480      1.4251      1.3242      1.9608
SAXPYing  :   35.8991      1.3970      1.3371      1.5714


Results for two copies (P2.1 and P2.2) follow:
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   848 hundredths of a second
 Wall Time ; time =     864.167 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   33.2178      0.9844      0.9633      1.0188
Scaling   :   34.1472      0.9532      0.9371      0.9791
Summing   :   35.4462      1.3762      1.3542      1.4258
SAXPYing  :   35.2128      1.3879      1.3631      1.4181
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   733 hundredths of a second
 Wall Time ; time =     789.167 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   33.2579      0.9846      0.9622      1.0101
Scaling   :   34.0676      0.9732      0.9393      1.0285
Summing   :   35.6316      1.5219      1.3471      2.3657
SAXPYing  :   34.7598      1.3987      1.3809      1.4217


Results for 4 copies (P4.out) follows:

--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   1087 hundredths of a second
 Wall Time ; time =     1257.08 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   29.9418      1.3021      1.0687      1.7687
Scaling   :   31.9710      1.3422      1.0009      2.1995
Summing   :   30.4404      1.7988      1.5769      2.3239
SAXPYing  :   32.0865      1.9042      1.4960      2.8321
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   1084 hundredths of a second
 Wall Time ; time =     1324.17 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   30.1089      1.3002      1.0628      2.0462
Scaling   :   31.3150      1.3510      1.0219      1.9828
Summing   :   32.6054      1.8798      1.4721      2.8029
SAXPYing  :   32.8456      1.7848      1.4614      2.3825
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   1066 hundredths of a second
 Wall Time ; time =     1232.25 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   30.9314      1.2966      1.0345      1.8223
Scaling   :   31.7224      1.1815      1.0088      1.4985
Summing   :   31.8746      1.9749      1.5059      3.2586
SAXPYing  :   31.6472      1.8541      1.5167      2.7178
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   1079 hundredths of a second
 Wall Time ; time =     1287.42 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   31.3683      1.2432      1.0201      1.7260
Scaling   :   30.4203      1.2905      1.0519      1.7533
Summing   :   32.9644      1.8274      1.4561      2.6191
SAXPYing  :   31.8586      1.9232      1.5067      2.8726


Results for 6 copies (P6.out) follows:

--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   1024 hundredths of a second
 Wall Time ; time =     1410.67 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   26.0079      1.3295      1.2304      1.4418
Scaling   :   28.2383      1.3288      1.1332      1.7055
Summing   :   28.0898      1.9766      1.7088      2.2266
SAXPYing  :   28.7659      1.9657      1.6686      2.3684
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   1010 hundredths of a second
 Wall Time ; time =     1405.08 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   27.0444      1.4051      1.1832      1.7970
Scaling   :   27.9884      1.3240      1.1433      1.5381
Summing   :   29.4722      1.8842      1.6287      2.1229
SAXPYing  :   29.0939      1.9961      1.6498      2.4653
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   1012 hundredths of a second
 Wall Time ; time =     1440.75 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   27.6242      1.4153      1.1584      1.9580
Scaling   :   27.1891      1.4089      1.1769      1.7027
Summing   :   28.6485      1.8387      1.6755      2.1152
SAXPYing  :   27.5635      1.9043      1.7414      2.2844
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   1022 hundredths of a second
 Wall Time ; time =     1481.00 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   28.4723      1.3335      1.1239      1.6311
Scaling   :   27.2368      1.3393      1.1749      1.5346
Summing   :   28.2188      1.9001      1.7010      2.1014
SAXPYing  :   27.3734      1.8806      1.7535      2.0705
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   976 hundredths of a second
 Wall Time ; time =     1352.67 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   27.7207      1.4028      1.1544      1.7443
Scaling   :   28.5197      1.3388      1.1220      1.6825
Summing   :   28.6751      1.9987      1.6739      2.3020
SAXPYing  :   28.3616      1.8661      1.6924      2.3033
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   945 hundredths of a second
 Wall Time ; time =     1314.33 hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   27.6434      1.3173      1.1576      1.4869
Scaling   :   27.2736      1.3789      1.1733      1.7669
Summing   :   29.0857      1.9833      1.6503      2.3564
SAXPYing  :   28.5905      1.8968      1.6789      2.1670

From mccalpin@perelandra.cms.udel.edu Tue Aug  2 07:38:36 1994
Received: from [144.110.16.11] by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI)
	for mccalpin id AA05385; Mon, 1 Aug 94 01:40:08 -0400
Received: by shark.mel.dit.csiro.au id AA12859
  (5.65c/IDA-1.4.4/DIT-1.3 for mccalpin@perelandra.cms.udel.edu); Mon, 1 Aug 1994 15:40:49 +1000
From: Robert Bell <Robert.Bell@mel.dit.csiro.au>
Message-Id: <199408010540.AA12859@shark.mel.dit.csiro.au>
Subject: Stream benchmark
To: mccalpin
Date: Mon, 1 Aug 1994 15:40:48 +1000 (EST)
Cc: csrcb@mel.dit.csiro.au (Robert Bell)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 3580      
Status: RO

John,
     I have been following your stream benchmark, and your advocacy of
the importance of memory speeds.  I have some of my own tests which
look particularly at strides, but I have not had time to develop them
much.
     I have recently run your test on a Cray Y-MP 4E/464, under UNICOS 
7.0.5.3, and found that the results were not as expected.  (I implemented 
the timing code by removing 'second' from the external declaration, and 
supplying no external routine, so the Cray intrinsic was used.)
     Here are the results with

cf77 -V -O vector3 -o stream stream.orig.f 
Cray CF77 Version 6.0 (6.55)  08/01/94 15:36:00
Cray FPP  Version 6.0 (3.06F1) 08/01/94  15:36:00               
 cft77-42 cf77: Cray CFT77 Version 6.0.3.9  (008708) 08/01/94 15:36:01
 cft77-2  cf77: COMPILE TIME     .669 SECONDS
 cft77-6  cf77: MAXIMUM FIELD LENGTH   421358 DECIMAL WORDS
 cft77-3  cf77: 236 SOURCE LINES
 cft77-4  cf77: 0 ERROR, 0 WARNING, 0 CAUTION, 0 COMMENT, 0 NOTE, 0 ANSI
 cft77-5  cf77: CODE: 567 WORDS, DATA: 281 WORDS

--------------------------------------
 Single precision appears to have 14 digits of accuracy
 Assuming 8 bytes per default REAL word
--------------------------------------
 Timing calibration ; time = 0.1128762 hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:**********      0.0000      0.0000      0.0000  
Scaling   :**********      0.0000      0.0000      0.0000  
Summing   :**********      0.0000      0.0000      0.0000  
SAXPYing  :**********      0.0000      0.0000      0.0000  

 
It appears that the compiler was smart enough to optimise away some of 
your code.  I changed some of the loops as follows
	            do 30 j=1,N
C Original.             c(j) = 3.0e0*a(j)
                b(j) = 3.0e0*c(j)
   30       continue

            do 40 j=1,N
C Original.             c(j) = a(j)+b(j)
                a(j) = b(j)+c(j)
   40       continue

The results were then

--------------------------------------
 Single precision appears to have 14 digits of accuracy
 Assuming 8 bytes per default REAL word
--------------------------------------
 Timing calibration ; time = 1.9692438 hundredths of a second
 Increase the size of the arrays if this is <30  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment: 2323.1070      0.0069      0.0069      0.0070  
Scaling   : 2354.6871      0.0069      0.0068      0.0070  
Summing   : 2365.9413      0.0105      0.0101      0.0107  
SAXPYing  : 2365.2446      0.0105      0.0101      0.0108  

These look correct for a 2-section Y-MP.  (The results in your table
are for a 4-section Y-MP.)

Note that the setting of c can still be removed from the do 10 loop,
and that the values of c should be used after the do 50 loop, to
prevent optimisers from taking further liberties with your code.

Also, it is good to avoid the first value returned by some of the
timing routines on some systems - the first value includes some setup
time.

-- 
Regards
Rob. Bell		     (	email: csrcb@mel.dit.csiro.au  )
--
/ Robert.Bell@mel.dit.csiro.au |  CSIRO Supercomputing Facility Manager  \
| CSIRO Division of Information Technology, Supercomputing Support Group |
| 723 Swanston Street | tel: {+61 3,03,} 282 2620 or {+61,0} 18 108 333  |
\ Carlton VIC 3053 Australia   |  fax: {+61 3,03,} 282 2600              /

From gary@chpc.utexas.edu  Tue Aug  2 13:11:02 1994
Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI)
	for mccalpin id AA08667; Tue, 2 Aug 94 13:11:02 -0400
Received: from gonzales.chpc.utexas.edu (gonzales.chpc.utexas.edu [129.116.16.12]) by bach.udel.edu (8.6.8/8.6.6) with SMTP id NAA28021 for <mccalpin@bach.udel.edu>; Tue, 2 Aug 1994 13:11:48 -0400
From: gary@chpc.utexas.edu
Received: by gonzales.chpc.utexas.edu (5.0/SMI-4.1)
	id AA13698; Tue, 2 Aug 1994 12:11:45 +0600
Message-Id: <9408021711.AA13698@gonzales.chpc.utexas.edu>
Subject: SP2 node results
To: mccalpin@bach.udel.edu
Date: Tue, 2 Aug 1994 12:11:42 -0500 (CDT)
X-Mailer: ELM [version 2.4 PL23]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 698       
Status: RO

Dr. McCalpin,

Do you have your stream benchmark results for the IBM 390 "thin" node,
as well as the IBM 590 "wide" node option for their SP2 system?  Since
the 590 path to memory is 256 bits wide, and that of the 390 is 64 bits,
I assume the 390 has 1/4 of the sustainable CPU<->Dcache<->DRAM band-
pass that the 590 has, since both are clocked at 66MHz, but I'd love to
see an actual measurement...

---Gary

Randolph Gary Smith                       Internet: gary@chpc.utexas.edu
Systems Group                             Phonenet: (512) 471-2411
Center for High Performance Computing     Snailnet: 10100 Burnet Road
The University of Texas System                      Austin, Texas 78758-4497

From mccalpin  Tue Aug  2 13:39:27 1994
Received: by perelandra.cms.udel.edu (931110.SGI/930416.SGI)
	for mccalpin id AA08776; Tue, 2 Aug 94 13:39:27 -0400
Date: Tue, 2 Aug 94 13:39:27 -0400
From: mccalpin (John D. McCalpin)
Message-Id: <9408021739.AA08776@perelandra.cms.udel.edu>
To: mccalpin
Subject: Stream on grieg
Status: RO

================================================================
Machine #1
================================================================
xlf -O3 -qarch -qstrict stream_d.f -o stream_d
measured on grieg.udel.edu, an IBM RS/6000-990 with 
2 memory cards --- i.e. a 128-bit bus with 128kB Dcache.

John D. McCalpin
mccalpin@perelandra.cms.udel.edu
Tue Aug  2 13:30:54 EDT 1994

 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  333.3333       .2684       .2400       .2900
Scaling   :  320.0000       .2691       .2500       .2800
Summing   :  342.8568       .3691       .3500       .3800
SAXPYing  :  352.9410       .3573       .3400       .3800
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  333.3337       .2755       .2400       .2900
Scaling   :  333.3337       .2706       .2400       .3000
Summing   :  363.6364       .3634       .3300       .3800
SAXPYing  :  352.9410       .3622       .3400       .3800
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  333.3337       .2533       .2400       .2800
Scaling   :  320.0000       .2592       .2500       .2800
Summing   :  363.6364       .3442       .3300       .3700
SAXPYing  :  363.6364       .3553       .3300       .3700
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  333.3333       .2592       .2400       .2700
Scaling   :  320.0000       .2612       .2500       .2800
Summing   :  363.6363       .3452       .3300       .3700
SAXPYing  :  363.6364       .3522       .3300       .3700
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  320.0000       .2662       .2500       .2800
Scaling   :  333.3337       .2532       .2400       .2700
Summing   :  352.9410       .3561       .3400       .3700
SAXPYing  :  352.9415       .3553       .3400       .3800
 ---------------------------------------------------
================================================================
Machine #1
================================================================
xlf -O3 -qarch -qstrict stream_d.f -o stream_d
measured on mozart.udel.edu, an IBM RS/6000-990 with 
4 memory cards --- i.e. a 256-bit bus with 256kB Dcache.

John D. McCalpin
mccalpin@perelandra.cms.udel.edu
Tue Aug  2 13:30:54 EDT 1994

 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  666.6673       .1332       .1200       .1400
Scaling   :  533.3347       .1551       .1500       .1600
Summing   :  705.8820       .1892       .1700       .2000
SAXPYing  :  705.8820       .1851       .1700       .1900
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  727.2726       .1318       .1100       .1600
Scaling   :  615.3846       .1643       .1300       .2000
Summing   :  749.9996       .1885       .1600       .2100
SAXPYing  :  750.0007       .1866       .1600       .2100
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  727.2718       .1275       .1100       .1400
Scaling   :  615.3841       .1549       .1300       .1900
Summing   :  749.9996       .1837       .1600       .2100
SAXPYing  :  750.0007       .1786       .1600       .2100
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  727.2718       .1296       .1100       .1500
Scaling   :  615.3841       .1535       .1300       .1800
Summing   :  799.9995       .1850       .1500       .2100
SAXPYing  :  749.9996       .1834       .1600       .2000
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  666.6673       .1363       .1200       .1500
Scaling   :  571.4291       .1563       .1400       .1800
Summing   :  750.0002       .1853       .1600       .2000
SAXPYing  :  705.8820       .1852       .1700       .2000
 ---------------------------------------------------
--
John D. McCalpin                        mccalpin@perelandra.cms.udel.edu
Assistant Professor                     mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.      John.McCalpin@mvs.udel.edu

From mccalpin@perelandra.cms.udel.edu Mon Aug 15 13:55:47 1994
Received: from inet-gw-1.pa.dec.com by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI)
	for mccalpin id AA03693; Mon, 15 Aug 94 12:15:47 -0400
Received: from us1rmc.bb.dec.com by inet-gw-1.pa.dec.com (5.65/10Aug94)
	id AA16282; Mon, 15 Aug 94 08:58:35 -0700
Received: from i4get.enet by us1rmc.bb.dec.com (5.65/rmc-22feb94)
	id AA12508; Mon, 15 Aug 94 11:58:39 -0400
Message-Id: <9408151558.AA12508@us1rmc.bb.dec.com>
Received: from i4get.enet; by us1rmc.enet; Mon, 15 Aug 94 11:58:39 EDT
Date: Mon, 15 Aug 94 11:58:39 EDT
From: "John, dtn 293-5506  15-Aug-1994 1123" <henning@i4get.enet.dec.com>
To: mccalpin
Cc: me@i4get.enet.dec.com
Apparently-To: mccalpin
Subject:  .gz ?
Status: O

Hello Professor McCalpin,

I expect to be able to give you some data on additional digital systems
in the near future; just need to get approvals before releasing the data.

Meanwhile, I've continued to study the data you already have, but have
run into a roadblock with the ".gz" files in /bench/stream_mail.
Would you be willing to post them in ".Z" format instead?

(I gather that .gz is gnu zip?  and on my DEC OSF/1 system, neither
"uncompress" nor "unpack" is able to read it.)

Thanks very much!
        /john

From mike@wavelet.apl.washington.edu  Fri Aug 26 12:26:26 1994
Received: from wavelet.apl.washington.edu by perelandra via SMTP (931110.SGI/930416.SGI)
	for mccalpin id AA01678; Fri, 26 Aug 94 12:26:26 -0400
Received: by wavelet.apl.washington.edu
	(5.65c) id AA14712; Fri, 26 Aug 1994 09:27:11 -0700
From: Mike Kenney <mike@wavelet.apl.washington.edu>
Message-Id: <199408261627.AA14712@wavelet.apl.washington.edu>
Subject: stream benchmark results
To: mccalpin
Date: Fri, 26 Aug 1994 09:27:10 -0700 (PDT)
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 772       
Status: RO


System:		Pentium/60  Intel Premiere PCI Motherboard 32Mb RAM
OS:		Linux v 1.0.9
Compiler:	gcc v 2.5.8
Compiler flags:	-O2 -m486 -fomit-frame-pointer -funroll-loops

Note: This compiler does not optimize for the Pentium

Timing calibration ; time = 2870.00 usec.
Increase the size of the arrays if this is < 300
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate   (MB/s)     RMS time    Min time    Max time
Assignment:     37.246     477.473     450.000     560.000
Scaling   :     62.077     283.390     270.000     320.000
Summing   :     61.320     424.594     410.000     490.000
SAXPYing  :     58.468     441.078     430.000     460.000


-- 
Mike Kenney
UW Applied Physics Lab
mikek@apl.washington.edu

From mike@wavelet.apl.washington.edu  Fri Aug 26 16:13:15 1994
Received: from wavelet.apl.washington.edu by perelandra via SMTP (931110.SGI/930416.SGI)
	for mccalpin id AA02413; Fri, 26 Aug 94 16:13:15 -0400
Received: by wavelet.apl.washington.edu
	(5.65c) id AA15248; Fri, 26 Aug 1994 13:14:02 -0700
From: Mike Kenney <mike@wavelet.apl.washington.edu>
Message-Id: <199408262014.AA15248@wavelet.apl.washington.edu>
Subject: Re: stream benchmark results
To: mccalpin (John D. McCalpin)
Date: Fri, 26 Aug 1994 13:14:02 -0700 (PDT)
In-Reply-To: <9408261632.AA01705@perelandra> from "John D. McCalpin" at Aug 26, 94 12:32:09 pm
X-Mailer: ELM [version 2.4 PL23]
Content-Type: text
Content-Length: 983       
Status: RO

John D. McCalpin writes:
> 
> Thanks for the data.   The rate for a plain copy is surprisingly slow,
> but the other numbers are quite good for a machine in that price range.
> --
> John D. McCalpin                        mccalpin@perelandra.cms.udel.edu
> Assistant Professor                     mccalpin@brahms.udel.edu
> College of Marine Studies, U. Del.      John.McCalpin@mvs.udel.edu
> 
> 

Looking at the assembler output from gcc, there are two 4-byte moves
issued on each iteration of the plain-copy loop.  The other loops
use the floating-point instructions which can move 8-bytes at a
time.  I would expect an 8-byte move instruction would approximately
double the first loop speed ... maybe a Pentium optimized gcc would
do this.

BTW, I tried substituting  c[j] = 1.0*a[j] in the first loop but
the compiler didn't fall for it ... it still issued the 4-byte
move instructions rather than an fp multiply.

-- 
Mike Kenney
UW Applied Physics Lab
mikek@apl.washington.edu

From mccalpin@perelandra Fri Sep  9 13:01:47 1994
Received: from inet-gw-1.pa.dec.com by perelandra via SMTP (931110.SGI/930416.SGI)
	for mccalpin id AA14365; Fri, 9 Sep 94 11:35:28 -0400
Received: from us1rmc.bb.dec.com by inet-gw-1.pa.dec.com (5.65/10Aug94)
	id AA15199; Fri, 9 Sep 94 08:22:50 -0700
Received: from i4get.enet by us1rmc.bb.dec.com (5.65/rmc-22feb94)
	id AA25478; Fri, 9 Sep 94 11:22:59 -0400
Message-Id: <9409091522.AA25478@us1rmc.bb.dec.com>
Received: from i4get.enet; by us1rmc.enet; Fri, 9 Sep 94 11:23:00 EDT
Date: Fri, 9 Sep 94 11:23:00 EDT
From: "John, dtn 293-5506  09-Sep-1994 1117" <henning@i4get.enet.dec.com>
To: mccalpin
Cc: shak@i4get.enet.dec.com, me@i4get.enet.dec.com
Apparently-To: mccalpin
Subject: STREAMS on 2 Alpha SMP systems
Status: O

Dear Professor McCalpin,

The CSG Performance Group at Digital has recently run the Streams
benchmark on two Alpha SMP systems.  I look forward to seeing an update
of your tables that includes these results.

If there are any questions we can answer about these runs, please feel
free to contact me.

I hope you are doing well with the new academic year!

Best regards
    /John Henning
     508 264 5506
     henning@msbcs.enet.dec.com or
     henning@zko.mts.dec.com


                    Bytes          Bandwidth (MB/s)               
Machine             /word      Copy     Scale       Sum     Triad 
---------------     -----  --------  --------  --------  -------- 
DEC 7000/640 4-CPU      8     261.9     257.4     261.1     265.9
DEC 7000/620 2-CPU      8     163.9     166.0     161.8     157.1
DEC 7000/610 1-CPU      8      91.6      91.6      88.8      86.9

DEC 2100 A500 2-CPU     8     133.3     133.6     129.6     127.7
DEC 2100 A500 1-CPU     8      79.2      80.8      79.1      79.3
---------------     -----  --------  --------  --------  -------- 
                                
 
                      num            Speedup ratio for
Machine              cpus      Copy     Scale       Sum     Triad
---------------     -----   --------  --------  --------  --------
DEC 7000/640            4      2.86      2.81      2.94      3.06
DEC 7000/620            2      1.79      1.81      1.82      1.81
DEC 7000/610            1      1.00      1.00      1.00      1.00

DEC 2100 A500           2      1.68      1.65      1.64      1.61
DEC 2100 A500           1      1.00      1.00      1.00      1.00
---------------     -----   --------  --------  --------  --------


2100_1cpu

--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    75.6000041961670      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   79.1954      0.2038      0.2020      0.2143
Scaling   :   80.7559      0.1987      0.1981      0.1997
Summing   :   79.0681      0.3037      0.3035      0.3045
SAXPYing  :   79.3231      0.3034      0.3026      0.3035


2100_2cpu: Add 2 tables together ... jobs running simultaneous
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    145.051205158234      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   66.6400      0.2404      0.2401      0.2411
Scaling   :   66.6400      0.2403      0.2401      0.2411
Summing   :   64.7110      0.3732      0.3709      0.3900
SAXPYing  :   63.8705      0.3768      0.3758      0.3777
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    145.539200305939      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   66.6400      0.2406      0.2401      0.2420
Scaling   :   66.9120      0.2400      0.2391      0.2407
Summing   :   64.8817      0.3737      0.3699      0.3968
SAXPYing  :   63.8706      0.3765      0.3758      0.3777


7600-1CPU
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    57.3487997055054      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   91.5835      0.1795      0.1747      0.2147
Scaling   :   91.5835      0.1750      0.1747      0.1757
Summing   :   88.7732      0.2705      0.2704      0.2709
SAXPYing  :   86.8911      0.2766      0.2762      0.2778


7600-2CPU:   Add 2 tables together ... jobs running simultaneous
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    74.3312001228333      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   81.9672      0.1964      0.1952      0.1981
Scaling   :   82.7952      0.1943      0.1932      0.1952
Summing   :   81.1557      0.2968      0.2957      0.2983
SAXPYing  :   78.6658      0.3059      0.3051      0.3070
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    73.4527945518494      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   81.9672      0.1960      0.1952      0.2001
Scaling   :   83.2155      0.1932      0.1923      0.1942
Summing   :   80.6235      0.2990      0.2977      0.3006
SAXPYing  :   78.4150      0.3066      0.3061      0.3070


7600-4CPU:   Add 4 tables together ... jobs running simultaneous

--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    104.782390594482      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   65.0533      0.2513      0.2460      0.2895
Scaling   :   63.5405      0.2525      0.2518      0.2534
Summing   :   65.3994      0.3683      0.3670      0.3695
SAXPYing  :   70.8650      0.3704      0.3387      0.3754
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    104.587197303772      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   65.4193      0.2517      0.2446      0.3051
Scaling   :   64.2880      0.2498      0.2489      0.2504
Summing   :   65.5738      0.3672      0.3660      0.3685
SAXPYing  :   65.6455      0.3716      0.3656      0.3734
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    100.878405570984      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   65.3125      0.2558      0.2450      0.3305
Scaling   :   64.2880      0.2505      0.2489      0.2514
Summing   :   65.3994      0.3681      0.3670      0.3689
SAXPYing  :   65.1240      0.3729      0.3685      0.3744
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 CPU time ; time =   0.000000000000000E+000 hundredths of a second
 Wall Time ; time =    96.2911963462830      hundredths of a second
 Increase the size of the arrays if this is <100 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function      Max MB/s    RMS time    Min time    Max time
Assignment:   66.1026      0.2565      0.2420      0.3461
Scaling   :   65.3125      0.2461      0.2450      0.2508
Summing   :   64.7110      0.3720      0.3709      0.3728
SAXPYing  :   64.2729      0.3745      0.3734      0.3763

From henning@i4get.enet.dec.com  Thu Oct 13 16:05:57 1994
Received: from inet-gw-1.pa.dec.com by perelandra via SMTP (931110.SGI/930416.SGI)
	for mccalpin id AA07734; Thu, 13 Oct 94 16:05:57 -0400
Received: from us1rmc.bb.dec.com by inet-gw-1.pa.dec.com (5.65/10Aug94)
	id AA22959; Thu, 13 Oct 94 12:45:31 -0700
Received: from i4get.enet by us1rmc.bb.dec.com (5.65/rmc-22feb94)
	id AA11379; Thu, 13 Oct 94 15:45:51 -0400
Message-Id: <9410131945.AA11379@us1rmc.bb.dec.com>
Received: from i4get.enet; by us1rmc.enet; Thu, 13 Oct 94 15:45:57 EDT
Date: Thu, 13 Oct 94 15:45:57 EDT
From: "John, dtn 293-5506  13-Oct-1994 1540" <henning@i4get.enet.dec.com>
To: mccalpin
Cc: me@i4get.enet.dec.com
Apparently-To: mccalpin
Subject: The Digital data...
Status: RO

Hi Professor McCalpin,

Thank you for including the data I submitted not too long ago in your tables.
However, in the interest of treating Digital the same as other vendors,
may I request that you include the 2-cpu and 4-cpu data in the base
table?  I notice that you have only:

DEC 7000/610 1 cpu      8      91.6      91.6      88.8      86.9
DEC 2100 A500 1 cpu     8      79.2      80.8      79.1      79.3

in table.print, but you have multiple cpu data in table.print for

Cray YMP/C90  2 cpu     8   13866.0   13905.5   18233.2   18246.3
Cray EL-98 2 cpu        8     826.7     833.8    1049.0    1078.2
Convex C3420 2 cpu      8     289.1     273.1     302.7     300.3
Convex C3420 2 cpu      4     242.7     238.8     263.7     260.0
Convex C3220 2 cpu      8     334.3     333.0     337.3     336.5
Convex C3220 2 cpu      4     178.7     177.5     211.0     208.8
Sun SS10/41 2 cpu       8      48.0      48.0      48.0      48.0
Stardent P3 3 cpu       8     228.6     246.2     246.2     200.0
Alliant VFX80 3 cpu     8      41.7      41.6      43.1      43.0
Alliant VFX80 3 cpu     4      23.8      23.8      25.2      25.3
Cray YMP/C90  4 cpu     8   27610.3   27789.6   34633.3   35044.1
Cray Y/MP 4 cpu         8    9685.8    9678.9   13781.4   13851.2
Cray EL-98 4 cpu        8    1564.9    1569.8    1933.8    1955.5
FPS MCP728 7 cpu        8    1040.5    1036.8     890.4     887.6
FPS MCP728 7 cpu        4    1046.2     478.0     393.0     394.8
Cray YMP/C90  8 cpu     8   55071.9   55391.8   60843.3   63229.6
Cray Y/MP 8 cpu         8   19291.6   19294.2   26588.9   26802.2
Cray EL-98 8 cpu        8    2362.8    2310.5    2373.7    2363.8
Alliant FX/80 8 cpu     8      72.8      71.6      76.3      76.5
Cray YMP/C90 16 cpu     8  105497.4  104656.4  101736.1  103812.8

Thanks very much for considering this request!
	/john henning

From kupec@agouron.com  Fri Oct 28 16:03:05 1994
Received: from tbone.agouron.com by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI)
	for mccalpin id AA06272; Fri, 28 Oct 94 16:03:05 -0400
Received: from blowhole.agouron.com by agouron.com via SMTP (940715.SGI.52/930901.SGI)
	for mccalpin@perelandra.cms.udel.edu id AA12039; Fri, 28 Oct 94 13:07:51 -0700
Received: by blowhole.agouron.com (940715.SGI.52/930416.SGI)
	for @agouron.com:mccalpin@perelandra.cms.udel.edu id AA00884; Fri, 28 Oct 94 13:08:30 -0700
From: "John W. Kupec" <kupec@agouron.com>
Message-Id: <9410281308.ZM882@blowhole.agouron.com>
Date: Fri, 28 Oct 1994 13:08:29 -0700
In-Reply-To: mccalpin@perelandra.cms.udel.edu (John D. McCalpin)
        "Re: stream" (Oct 28,  3:32pm)
References: <9410281905.AA24411@irisb> 
	<9410281532.ZM6196@perelandra.cms.udel.edu>
X-Mailer: Z-Mail (3.1.0 22feb94 MediaMail)
To: mccalpin (John D. McCalpin)
Subject: Re: stream
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0
Status: RO

On Oct 28,  3:32pm, John D. McCalpin wrote:
> Subject: Re: stream
> On Oct 28, 12:05pm, John W. Kupec wrote:
> > Having problems here.  I've tried various f77 incantations &
> > get "Segmentation fault" under IRIX 6.0 and IRIX 5.2 (R4400).

Sure enough, stacksize was 64Mb.  Here's the results after compiling
"f77 -mips4 -O3 stream_d.f -o stream_d" on Power Challenge-L, IRIX 6.0,
1GB, 2-way interleaved memory:


--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    118.6243979260325     hundredths of a second
 Increase the size of the arrays if this is <30
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  130.9830      0.4904      0.4886      0.4991
Scaling   :  131.8875      0.4871      0.4853      0.4921
Summing   :  129.2836      0.7464      0.7426      0.7659
SAXPYing  :  129.0843      0.7448      0.7437      0.7462
 Sum of a is :   2.3066015625918735E+18
 Sum of b is :   4.6132031248564384E+17
 Sum of c is :   6.1509375001412557E+17

real       27.26
user       25.37
sys         1.26

There is no -O4 under IRIX 6.0.

Here are the results I obtained with "f77 -mips2 -O3 -non_shared"
on an Indigo 2, IRIX 5.2, 150Mhz R4400 CPU, 128MB memory:

--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    309.9999904632568     hundredths of a second
 Increase the size of the arrays if this is <30
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:   59.8131      1.1270      1.0700      1.3300
Scaling   :   57.1427      1.2077      1.1200      1.4000
Summing   :   61.1465      1.7146      1.5700      2.0100
SAXPYing  :   67.1329      1.5382      1.4300      1.8700
 Sum of a is :   2.3066015623519370E+18
 Sum of b is :   4.6132031251484826E+17
 Sum of c is :   6.1509374999898381E+17

real     1:02.47
user       54.58
sys         5.66


Would you mind providing comparisions to other machines?

Thanks


From sgf@cfm.brown.edu  Fri Oct 28 16:34:57 1994
Received: from cfm.brown.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI)
	for mccalpin id AA06436; Fri, 28 Oct 94 16:34:57 -0400
Received: by cfm.brown.edu (4.1/1.00)
	id AA13480; Fri, 28 Oct 94 16:36:03 EDT
Date: Fri, 28 Oct 94 16:36:03 EDT
From: sgf@cfm.brown.edu (Sam Fulcomer)
Message-Id: <9410282036.AA13480@cfm.brown.edu>
To: mccalpin
Subject: Re: New MIPS R10000? [actually, more on R8000]
Status: RO

Well..., here' the results and such...

Let me know what you think. I'm now going to get our stuff running and attempt
to do some optimization.

-s

odin 61% hinv
15 75 MHZ IP21 Processors

?? ...hhmmm.... one down already....

CPU: MIPS R8000 Processor Chip Revision: 2.2
FPU: MIPS R8010 Floating Point Chip Revision: 0.1
Data cache size: 16 Kbytes
Instruction cache size: 16 Kbytes
Secondary unified instruction/data cache size: 4 Mbytes
Main memory size: 2048 Mbytes, 8-way interleaved


odin 54% f77 -mips4 -O3 -o stream_df stream_d.f    
WARNING 85: definition of qlogb in /usr/lib64/mips4/libm.so preempts that definition in /usr/lib64/mips4/libc.so.
odin 55% cc  -mips4 -O3 -o stream_dc stream_d.c
"stream_d.c", line 72: warning(1552): variable "c" was set but never used
  static real a[N],b[N],c[N];
                        ^

odin 56%  stream_df
--------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLEPRECISION word
--------------------------------------
 Timing calibration ; time =    54.80809807777405     hundredths of a second
 Increase the size of the arrays if this is <30 
  and your clock precision is =<1/100 second
 ---------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Assignment:  134.9150      0.2374      0.2372      0.2378
Scaling   :  134.8364      0.2376      0.2373      0.2388
Summing   :  134.8351      0.3562      0.3560      0.3565
SAXPYing  :  134.7553      0.3564      0.3562      0.3567
 Sum of a is :   1.1533007812800417E+18
 Sum of b is :   2.3066015625365702E+17
 Sum of c is :   3.0754687500612557E+17

odin 58% stream_dc
Timing calibration ; time = 274158.000000 usec.
Increase the size of the arrays if this is < 300000
and your clock precision is =< 1/100 second.
---------------------------------------------------
Function Rate   (MB/s)   RMS time     Min time     Max time
Assignment:    135.681   510998.938   117924.000  1117907.000
Scaling   :    135.926   118220.000   117711.000   120189.000
Summing   :    139.726   546249.375   171765.000  1171895.000
SAXPYing  :    136.220   549960.500   176185.000  1178720.000