From mccalpin@enso.ocean.fsu.edu Tue Jan 25 11:07:22 1994 Received: from mailer.fsu.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA01345; Tue, 25 Jan 94 11:07:20 -0500 Return-Path: Received: from enso.ocean.fsu.edu by mailer.fsu.edu with SMTP id AA29308 (5.65c/IDA-1.4.4 for <@mailer.fsu.edu:mccalpin@perelandra.cms.udel.edu>); Tue, 25 Jan 1994 11:10:06 -0500 Received: by enso.ocean.fsu.edu (931110.SGI/931108.SGI.AUTO.ANONFTP) for @mailer.fsu.edu:mccalpin@perelandra.cms.udel.edu id AA03404; Tue, 25 Jan 94 11:09:52 -0500 Date: Tue, 25 Jan 94 11:09:52 -0500 From: mccalpin@enso.ocean.fsu.edu (John D. McCalpin) Subject: stream_d on SGI Onyx Message-Id: <9401251609.AA03404@enso.ocean.fsu.edu> Apparently-To: mccalpin Status: RO -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 199.9999962747097 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Summing : 56.4707 0.8550 0.8500 0.8600 SAXPYing : 61.5385 0.7880 0.7800 0.7900 Function Rate (MB/s) RMS time Min time Max time Assignment: 58.1819 0.5600 0.5500 0.5700 Scaling : 57.1429 0.5660 0.5600 0.5800 Summing : 57.1428 0.8520 0.8400 0.8600 SAXPYing : 62.3377 0.7820 0.7700 0.7900 From ben@REX.RE.uokhsc.edu Wed Jan 19 23:18:50 1994 Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA09623; Wed, 19 Jan 94 23:18:48 -0500 Return-Path: Received: from who.net.uokhsc.edu (WHO.NET.UOKHSC.EDU [157.142.8.10]) by bach.udel.edu (8.6.4/8.6.4) with SMTP id XAA11442 for ; Wed, 19 Jan 1994 23:21:05 -0500 Received: from REX.RE.UOKHSC.EDU by who.net.uokhsc.edu (5.64/A/UX-3.00) id AA11272; Wed, 19 Jan 94 23:23:01 PST Received: by rex.re.uokhsc.edu (5.4R2.01/1.34) id AA05227; Wed, 19 Jan 1994 22:20:38 -0600 From: ben@REX.RE.uokhsc.edu (Benjamin Z. Goldsteen) Message-Id: <9401200420.AA05227@rex.re.uokhsc.edu> Subject: STREAM results for AXP 4000/710 To: mccalpin@bach.udel.edu Date: Wed, 19 Jan 1994 22:20:38 -0600 (CST) Reply-To: benjamin-goldsteen@uokhsc.edu X-Mailer: ELM [version 2.4 PL21] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 2446 Status: RO X-Status: Hello, I found AXPOSF.PA.DEC.COM nearly completely unloaded (1 other non-idle user and UNIX load-average was 0.0) today so I thought I might just run the STREAM benchmarks on it because the only numbers I had seen were for the DEC AXP 3000/500. The AXPOSF is a DEC AXP 4000/710 with a 21064-190/40 (internal/external MHZ) and a 4MB secondary cache. The peek memory bus speed is rated at 640 MB/second (128-bits * 16 bytes/128-bits * 40 M clock-ticks/second). AXPOSF runs DEC OSF/1 V1.3. I used the compiler (V1.3) options: f77 -O4 -align dcommon stream_?.f second.o -o stream_? (things like "-assume noaccuracy_sensitive" didn't make much difference. I tried KAP "just to see" and I think it completely deleted the main loop) I was rather surprised at the results since they are lower than the DEC AXP 3000/500 results. When I did the test I did 'up the "N" parameter to 4,000,000 and I think this is the cause since when I left them at the default they tended to wander (around the 100MB/sec). These numbers seem stable and consistant: -------------------------------------- Single precision appears to have 7 digits of accuracy Assuming 4 bytes per default REAL word -------------------------------------- Timing calibration ; time = 146.3024 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 80.3603 0.3990 0.3982 0.3992 Scaling : 80.3601 0.3991 0.3982 0.3992 Summing : 79.1953 0.6080 0.6061 0.6100 SAXPYing : 78.8146 0.6107 0.6090 0.6149 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 360.6319956481457 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 84.6114 0.7576 0.7564 0.7593 Scaling : 82.2759 0.7801 0.7779 0.7896 Summing : 85.0871 1.1300 1.1283 1.1322 SAXPYing : 83.8539 1.1469 1.1448 1.1536 -- Benjamin Z. Goldsteen From mccalpin Fri Feb 4 10:07:57 1994 Received: by perelandra.cms.udel.edu (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA00902; Fri, 4 Feb 94 10:07:57 -0500 Return-Path: Date: Fri, 4 Feb 94 10:07:57 -0500 From: mccalpin (John D. McCalpin) Message-Id: <9402041507.AA00902@perelandra.cms.udel.edu> To: mccalpin Subject: Stream data for Sun SC2000 Status: RO stream_d data from strauss.udel.edu, an 8-cpu SparcCenter 2000 using the Apogee apf77 compiler and KAP preprocessor. The timing is done with the 'etime' intrinsic, whose accuracy in the case of multiple processors is quite uncertain.... Parameters: N=2,000,000 ntimes=10 Shell: unlimit stacksize apf77 -Xkap=mp stream_d.f -o stream_d foreach N (1 2 3 4 5 6 7 8) echo PARALLEL = $PARALLEL setenv PARALLEL $N stream_d end Results: PARALLEL = 1 Function Rate (MB/s) RMS time Min time Max time Assignment: 79.0126 0.8220 0.8100 0.8300 Scaling : 76.1905 0.8450 0.8400 0.8500 Summing : 75.5908 1.2800 1.2700 1.3000 SAXPYing : 71.1112 1.3550 1.3500 1.3700 PARALLEL = 2 Function Rate (MB/s) RMS time Min time Max time Assignment: 152.3809 0.4344 0.4200 0.4900 Scaling : 148.8378 0.4412 0.4300 0.4800 Summing : 147.6928 0.6591 0.6500 0.6800 SAXPYing : 139.1307 0.6971 0.6900 0.7100 PARALLEL = 3 Function Rate (MB/s) RMS time Min time Max time Assignment: 220.6904 0.3001 0.2900 0.3100 Scaling : 213.3339 0.3091 0.3000 0.3200 Summing : 213.3334 0.4601 0.4500 0.4800 SAXPYing : 204.2556 0.4874 0.4700 0.5400 PARALLEL = 4 Function Rate (MB/s) RMS time Min time Max time Assignment: 278.2614 0.2441 0.2300 0.2600 Scaling : 278.2614 0.2504 0.2300 0.2700 Summing : 274.2854 0.3652 0.3500 0.3800 SAXPYing : 259.4589 0.3841 0.3700 0.4000 PARALLEL = 5 Function Rate (MB/s) RMS time Min time Max time Assignment: 320.0018 0.2061 0.2000 0.2200 Scaling : 336.8412 0.2128 0.1900 0.2500 Summing : 320.0008 0.3122 0.3000 0.3300 SAXPYing : 309.6780 0.3191 0.3100 0.3400 PARALLEL = 6 Function Rate (MB/s) RMS time Min time Max time Assignment: 376.4662 0.1852 0.1700 0.2000 Scaling : 376.4704 0.1846 0.1700 0.2100 Summing : 369.2332 0.2691 0.2600 0.2800 SAXPYing : 355.5575 0.2976 0.2700 0.4100 PARALLEL = 7 Function Rate (MB/s) RMS time Min time Max time Assignment: 426.6678 0.1635 0.1500 0.1900 Scaling : 426.6678 0.1666 0.1500 0.1900 Summing : 417.3921 0.2411 0.2300 0.2500 SAXPYing : 400.0004 0.2451 0.2400 0.2600 PARALLEL = 8 Function Rate (MB/s) RMS time Min time Max time Assignment: 492.3036 0.1643 0.1300 0.2800 Scaling : 492.3073 0.2038 0.1300 0.4300 Summing : 505.2668 0.2184 0.1900 0.2400 SAXPYing : 479.9982 0.2247 0.2000 0.2700 -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. John.McCalpin@mvs.udel.edu From mccalpin@perelandra.cms.udel.edu Sat Feb 26 10:55:03 1994 Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA21276; Tue, 22 Feb 94 00:55:25 -0500 Received: from vnet.IBM.COM (vnet.ibm.com [192.239.48.4]) by bach.udel.edu (8.6.4/8.6.4) with SMTP id AAA08746 for ; Tue, 22 Feb 1994 00:59:32 -0500 From: NIK@KULVM.VNET.IBM.COM Message-Id: <199402220559.AAA08746@bach.udel.edu> Received: from KULVM by vnet.IBM.COM (IBM VM SMTP V2R2) with BSMTP id 2551; Tue, 22 Feb 94 00:55:55 EST Date: Tue, 22 Feb 94 13:52:33 MAL To: mccalpin@bach.udel.edu Subject: Bandwidth Status: RO Dear John, I have your report on subject memory bandwidth dated May 4, 1993. Since then IBM and its competition have had new machines added into our product line. I wonder if you have a revised version of the subject report which include bandwidth computations for the new machines for IBM,HP and SUN especially. It would help my customers in deriving their RFP specifications. Many thanks in advance, Best regards, Nik Habibah From jbs@watson.ibm.com Sun Feb 27 06:08:26 EST 1994 Article: 13502 of comp.benchmarks Path: news.udel.edu!udel!MathWorks.Com!news.kei.com!uuspew.uu.net!uunet!watson.ibm.com From: jbs@watson.ibm.com Message-ID: <19940225.150338.930@almaden.ibm.com> Date: Fri, 25 Feb 94 17:53:58 EST Newsgroups: comp.benchmarks,comp.unix.aix Subject: Streams (and other) Benchmarks for the 590 Lines: 25 Xref: news.udel.edu comp.benchmarks:13502 comp.unix.aix:37452 Status: RO Hugh LaMaster asked: >I understand the 590 is shipping now. Has anyone run the >"streams" benchmark on the 590. Could the results be >posted please? I just ran this. I used version 2.0 of streamd.f dated 9/30/91. I used n=3000000 and ntimes=10. I used version 03.01.0000. 0000 of xlf with compiler options "-c -O3 -qstrict -qsource -qlist -qarch=pwrx". The output was: -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 69.0000000000000000 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 600.0000 .0831 .0800 .0900 Scaling : 533.3333 .1002 .0900 .1100 Summing : 654.5455 .1161 .1100 .1200 SAXPYing : 654.5455 .1152 .1100 .1300 James B. Shearer Disclaimer: I work for IBM, however this is not an official result. From mccalpin@perelandra.cms.udel.edu Fri Mar 4 12:13:31 1994 Received: from timbuk.cray.com by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA03522; Fri, 4 Mar 94 12:11:07 -0500 Received: from magnet (magnet.cray.com) by cray.com (Bob mailer 1.2) id AA04432; Fri, 4 Mar 94 11:15:47 CST Received: by magnet (4.1/CRI-5.13) id AA02121; Fri, 4 Mar 94 11:15:44 CST From: cmg@ferrari.cray.com (Charles Grassl) Message-Id: <9403041715.AA02121@magnet> Subject: STREAM on T3D To: mccalpin (John D. McCalpin) Date: Fri, 4 Mar 94 11:15:40 CST In-Reply-To: ; from "John D. McCalpin" at Oct 29, 93 11:23 am X-Mailer: ELM [version 2.3 PL11] Status: RO Dear John; Would you please update the STREAM table on the anonymous ftp archive with the data below for the CRAY T3D. Bytes Bandwidth (MB/s) Clock Machine /word Copy Scale Sum Triad MHz --------------- ----- -------- -------- -------- -------- ------ CRAY T3D 256 PEs 8 66106.2 51583.5 33324.7 89105.0 150 CRAY T3D 128 PEs 8 32973.0 25793.4 16661.7 44665.0 150 CRAY T3D 64 PEs 8 16520.9 12896.0 8331.3 22338.4 150 CRAY T3D 32 PEs 8 8264.9 6448.3 4165.7 11167.5 150 The actual output from the T3D is listed below. Also, I noticed in the sc spreadsheet that the C90 clock rate is listed as 250 MHz. This is incorrect. The C90 clock rate is 240 MHz. Thank you and Regards, Charles Grassl Cray Research, Inc. Eagan, Minnesota OUTPUT FROM CRAY T3D FOLLOWS: ----------------------------- *** STREAM benchmark *** Number of PEs: 32 Number of iterations: 5 Size of Arrays: 400000 Test #1 Failed = picalc=piexact Apparently Single=Double Precision Proceeding to Test #2 -------------------------------------- Single precision appears to have 16 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 6.1997264331694169 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Rate / PE Assignment: 8264.9256 0.0248 0.0248 0.0250 258.2789 Scaling : 6448.3460 0.0319 0.0318 0.0320 201.5108 Summing : 4165.7771 0.0738 0.0737 0.0740 130.1805 SAXPYing :11167.5113 0.0276 0.0275 0.0278 348.9847 *** STREAM benchmark *** Number of PEs: 64 Number of iterations: 5 Size of Arrays: 400000 Test #1 Failed = picalc=piexact Apparently Single=Double Precision Proceeding to Test #2 -------------------------------------- Single precision appears to have 16 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 6.1920564294268843 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Rate / PE Assignment:16520.9172 0.0249 0.0248 0.0249 258.1393 Scaling :12896.0747 0.0318 0.0318 0.0319 201.5012 Summing : 8331.3155 0.0739 0.0737 0.0740 130.1768 SAXPYing :22338.4170 0.0277 0.0275 0.0278 349.0378 *** STREAM benchmark *** Number of PEs: 128 Number of iterations: 5 Size of Arrays: 400000 Test #1 Failed = picalc=piexact Apparently Single=Double Precision Proceeding to Test #2 -------------------------------------- Single precision appears to have 16 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 6.1994464331291965 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Rate / PE Assignment:32972.9965 0.0249 0.0248 0.0250 257.6015 Scaling :25793.4001 0.0318 0.0318 0.0319 201.5109 Summing :16661.7047 0.0739 0.0737 0.0739 130.1696 SAXPYing :44665.0442 0.0276 0.0275 0.0277 348.9457 *** STREAM benchmark *** Number of PEs: 256 Number of iterations: 5 Size of Arrays: 400000 Test #1 Failed = picalc=piexact Apparently Single=Double Precision Proceeding to Test #2 -------------------------------------- Single precision appears to have 16 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 6.2008504338336934 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1 /100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Rate / PE Assignment:66106.2260 0.0249 0.0248 0.0250 258.2274 Scaling :51583.5518 0.0318 0.0318 0.0320 201.4982 Summing :33324.7739 0.0739 0.0737 0.0740 130.1749 SAXPYing :89104.9894 0.0277 0.0276 0.0278 348.0664 From castor@drizzle.Stanford.EDU Tue Mar 15 15:38:27 1994 Received: from drizzle.Stanford.EDU by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA09361; Tue, 15 Mar 94 15:38:27 -0500 Received: from grizzle.Stanford.EDU (grizzle.Stanford.EDU [36.59.0.80]) by drizzle.Stanford.EDU (8.6.4/8.6.4) with ESMTP id MAA26578 for ; Tue, 15 Mar 1994 12:39:00 -0800 From: Castor Fu Received: from localhost (castor@localhost) by grizzle.Stanford.EDU (8.6.4/8.6.4) id MAA03337 for mccalpin@perelandra.cms.udel.edu; Tue, 15 Mar 1994 12:38:59 -0800 Message-Id: <199403152038.MAA03337@grizzle.Stanford.EDU> Subject: stream benchmark results To: mccalpin Date: Tue, 15 Mar 1994 12:38:58 -0800 (PST) X-Mailer: ELM [version 2.4 PL2] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Length: 969 Status: RO On a DEC 3000/300 running OSF/1 v1.3, using DESC='DEC Fortran V3.3 for OSF/1 AXP Systems' I got the following results: castor:hassle[59] f77 -O stream_d.f castor:hassle[60] a.out -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 136.0544040799141 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 33.3709 0.3942 0.3836 0.4451 Scaling : 33.4561 0.3871 0.3826 0.3982 Summing : 39.5818 0.4938 0.4851 0.5270 SAXPYing : 38.8778 0.4979 0.4939 0.5183 17.713u 1.503s 0:41.72 46.0% 0+0+0k (0k) 2+0io 0pf+0w -castor fu castor@drizzle.stanford.edu From mccalpin@perelandra.cms.udel.edu Wed Apr 20 07:00:07 1994 Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA29086; Tue, 19 Apr 94 16:36:46 -0400 Received: from marvin.cdf.utoronto.ca (root@marvin.cdf.toronto.edu [128.100.31.3]) by bach.udel.edu (8.6.8/8.6.6) with SMTP id QAA16489 for ; Tue, 19 Apr 1994 16:36:44 -0400 Received: by marvin.cdf.toronto.edu id <31303>; Tue, 19 Apr 1994 16:36:39 -0400 From: John DiMarco To: mccalpin@bach.udel.edu Subject: streams benchmarks results Message-Id: <94Apr19.163639edt.31303@marvin.cdf.toronto.edu> Date: Tue, 19 Apr 1994 16:36:33 -0400 Status: RO One minor comment about your table: the Sun 670 numbers are not particularly helpful, since the 670 is available with several different types of CPU. I've run your streams benchmark on some of our newer suns with the new SunPro 3.0 f77 compiler (no parallelizing). Here are the results. (code was compiled using f77 -cg92 -O4 -fsimple -dalign -Bstatic) All in all, the machines were not particularly impressive on this benchmark. Regards, John -- John DiMarco jdd@cdf.toronto.edu Computing Disciplines Facility Systems Manager jdd@cdf.utoronto.ca University of Toronto EA201B,(416)978-1928 -------- cut here -------- 128MB SPARCcenter 1000 (model 1102: 2x50MHz SuperSPARCs, 1MB external cache): Assuming 8 bytes per DOUBLEPRECISION word Function Rate (MB/s) RMS time Min time Max time Assignment: 37.2453 0.8608 0.8592 0.8654 Scaling : 35.2409 0.9095 0.9080 0.9131 Summing : 37.9461 1.2661 1.2650 1.2674 SAXPYing : 41.2282 1.1661 1.1643 1.1704 Assuming 4 bytes per default REAL word Function Rate (MB/s) RMS time Min time Max time Assignment: 33.7530 0.1204 0.1185 0.1234 Scaling : 30.4827 0.1334 0.1312 0.1384 Summing : 34.2466 0.1778 0.1752 0.1851 SAXPYing : 30.6143 0.1983 0.1960 0.2022 64MB SPARCstation 10 model 402 (2x40MHz SuperSPARCs, no external cache) Assuming 8 bytes per DOUBLEPRECISION word Function Rate (MB/s) RMS time Min time Max time Assignment: 53.8878 0.6701 0.5938 1.1235 Scaling : 50.3085 0.6411 0.6361 0.6475 Summing : 53.1078 0.9930 0.9038 1.5348 SAXPYing : 49.4348 0.9809 0.9710 1.0099 Assuming 4 bytes per default REAL word Function Rate (MB/s) RMS time Min time Max time Assignment: 41.7293 0.0965 0.0959 0.0980 Scaling : 35.4505 0.1134 0.1128 0.1139 Summing : 42.6789 0.1413 0.1406 0.1426 SAXPYing : 35.4308 0.1701 0.1693 0.1704 128MB Sun SPARCstation 10 model 514 (4x50MHz SuperSPARCs, 1MB external cache) Assuming 8 bytes per DOUBLEPRECISION word Function Rate (MB/s) RMS time Min time Max time Assignment: 44.0597 0.7290 0.7263 0.7343 Scaling : 40.4820 0.7946 0.7905 0.7993 Summing : 45.6326 1.0608 1.0519 1.0724 SAXPYing : 44.3635 1.0892 1.0820 1.0943 Assuming 4 bytes per default REAL word Function Rate (MB/s) RMS time Min time Max time Assignment: 37.8953 0.1062 0.1056 0.1075 Scaling : 32.6593 0.1230 0.1225 0.1238 Summing : 39.7357 0.1518 0.1510 0.1545 SAXPYing : 30.1364 0.1998 0.1991 0.2006 From mccalpin@perelandra.cms.udel.edu Fri May 6 15:16:12 1994 Received: from sgigate.SGI.COM by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA24449; Fri, 6 May 94 15:02:39 -0400 Received: from relay.sgi.com (relay.sgi.com [192.26.51.36]) by sgigate.sgi.com (8.6.4/8.6.4) with SMTP id MAA12523; Fri, 6 May 1994 12:03:28 -0700 Received: from phil.trevose.sgi.com by relay.sgi.com via SMTP (920330.SGI/920502.SGI) for @sgigate.sgi.com:mccalpin@perelandra.cms.udel.edu id AA09179; Fri, 6 May 94 12:03:22 -0700 Received: from manic.trevose.sgi.com by phil.trevose.sgi.com via SMTP (931110.SGI/930416.SGI) for @relay.sgi.com:mccalpin@perelandra.cms.udel.edu id AA16243; Fri, 6 May 94 15:08:22 -0400 Received: by manic.trevose.sgi.com (931110.SGI/930416.SGI) for @phil.trevose.sgi.com:mccalpin@perelandra.cms.udel.edu id AA18121; Fri, 6 May 94 15:08:20 -0400 From: "Rusty Benson" Message-Id: <9405061508.ZM18119@manic.trevose.sgi.com> Date: Fri, 6 May 1994 15:08:20 -0400 X-Mailer: Z-Mail-SGI (3.0S.1026 26oct93 MediaMail) To: mccalpin Subject: stream benchmark numbers Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Status: RO John, Here are the numbers I obtained for the memory bandwith test. The machine for the single job run is configured as follows: 4 150 MHZ IP19 Processors CPU: MIPS R4400 Processor Chip Revision: 5.0 FPU: MIPS R4010 Floating Point Chip Revision: 0.0 Data cache size: 16 Kbytes Instruction cache size: 16 Kbytes Secondary unified instruction/data cache size: 1 Mbyte Main memory size: 128 Mbytes, 2-way interleaved I/O board, Ebus slot 3: IO4 revision 1 Integral EPC serial ports: 4 RealityEngineII Graphics Pipe 0 at IO Slot 3 Physical Adapter 2 (Fchip rev 2) Integral Ethernet controller: et0, Ebus slot 3 Integral SCSI controller 4: Version WD33C95A Integral SCSI controller 3: Version WD33C95A Integral SCSI controller 2: Version WD33C95A Integral SCSI controller 1: Version WD33C95A Disk drive: unit 5 on SCSI controller 1 Disk drive: unit 4 on SCSI controller 1 Disk drive: unit 3 on SCSI controller 1 Disk drive: unit 2 on SCSI controller 1 Disk drive: unit 1 on SCSI controller 1 Integral SCSI controller 0: Version WD33C95A Tape drive: unit 5 on SCSI controller 0: DAT Disk drive: unit 1 on SCSI controller 0 Integral EPC parallel port: Ebus slot 3 VME bus: adapter 0 mapped to adapter 13 VME bus: adapter 13 The compiler options used are: -O3 -mips2 -non_shared -sopt ************************1 proc************************************* The results are as follows: re 20> cat P1.out -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 11 hundredths of a second Wall Time ; time = 106.9167 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 67.5474 0.3085 0.3079 0.3097 Scaling : 68.1404 0.3058 0.3053 0.3063 Summing : 68.5743 0.4554 0.4550 0.4560 SAXPYing : 68.5138 0.4557 0.4554 0.4559 ***********************4 procs**************************************** The following results for "Parallel_jobs" were obtained on a similarly configured machine with 1024 MB of main memory. re 21> cat P4.out -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 14 hundredths of a second Wall Time ; time = 147.5833 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 64.7151 0.3225 0.3214 0.3235 Scaling : 62.4518 0.3337 0.3331 0.3345 Summing : 65.2987 0.4791 0.4778 0.4800 SAXPYing : 63.8196 0.4947 0.4889 0.4964 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 12 hundredths of a second Wall Time ; time = 148.2500 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 65.7558 0.3176 0.3163 0.3195 Scaling : 63.3231 0.3294 0.3285 0.3307 Summing : 66.9963 0.4671 0.4657 0.4680 SAXPYing : 64.8561 0.4841 0.4811 0.4901 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 13 hundredths of a second Wall Time ; time = 146.1667 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 66.7203 0.3122 0.3117 0.3127 Scaling : 64.0799 0.3250 0.3246 0.3253 Summing : 67.6132 0.4617 0.4614 0.4622 SAXPYing : 65.4464 0.4771 0.4767 0.4777 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 12 hundredths of a second Wall Time ; time = 141.1667 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 66.5551 0.3133 0.3125 0.3139 Scaling : 63.9300 0.3261 0.3254 0.3267 Summing : 67.3581 0.4635 0.4632 0.4639 SAXPYing : 65.2433 0.4788 0.4782 0.4796 **********************8 procs***************************************** re 22> cat P8.out -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 12 hundredths of a second Wall Time ; time = 230.5000 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 58.1825 0.3622 0.3575 0.3762 Scaling : 58.6926 0.3595 0.3544 0.3703 Summing : 61.6608 0.5163 0.5060 0.5195 SAXPYing : 62.0106 0.5260 0.5031 0.5331 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 13 hundredths of a second Wall Time ; time = 233.5833 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 58.7745 0.3608 0.3539 0.3628 Scaling : 58.9172 0.3573 0.3530 0.3608 Summing : 61.4912 0.5163 0.5074 0.5193 SAXPYing : 61.7547 0.5253 0.5052 0.5296 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 13 hundredths of a second Wall Time ; time = 245.3333 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 59.6023 0.3977 0.3490 0.4125 Scaling : 62.2259 0.4016 0.3343 0.4285 Summing : 66.5438 0.5453 0.4689 0.5584 SAXPYing : 65.0047 0.5498 0.4800 0.5697 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 12 hundredths of a second Wall Time ; time = 240.2500 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 58.5795 0.3986 0.3551 0.4150 Scaling : 60.1227 0.3960 0.3460 0.4167 Summing : 65.9272 0.5413 0.4732 0.5524 SAXPYing : 64.7308 0.5483 0.4820 0.5612 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 15 hundredths of a second Wall Time ; time = 251.4167 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 58.9810 0.3637 0.3527 0.3674 Scaling : 58.3560 0.3593 0.3564 0.3610 Summing : 60.9280 0.5209 0.5121 0.5242 SAXPYing : 61.3016 0.5281 0.5090 0.5332 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 12 hundredths of a second Wall Time ; time = 233.6667 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 59.2954 0.4080 0.3508 0.4291 Scaling : 59.2147 0.3975 0.3513 0.4126 Summing : 65.2718 0.5471 0.4780 0.5628 SAXPYing : 64.5495 0.5475 0.4833 0.5602 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 12 hundredths of a second Wall Time ; time = 239.1667 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 61.3640 0.3614 0.3390 0.3661 Scaling : 59.2903 0.3587 0.3508 0.3616 Summing : 59.8287 0.5247 0.5215 0.5270 SAXPYing : 60.7323 0.5264 0.5137 0.5291 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 13 hundredths of a second Wall Time ; time = 197.3333 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 64.1660 0.3605 0.3242 0.3677 Scaling : 62.0536 0.3568 0.3352 0.3656 Summing : 61.5227 0.5227 0.5071 0.5262 SAXPYing : 59.0152 0.5316 0.5287 0.5343 Rusty -- ----------------------------------------------------------------------- Rusty Benson - Systems Engineer benson@sgi.com Silicon Graphics Computer Systems 8 Neshaminy Interplex, Suite 115 Trevose PA 19053 (215) 638-3707 ----------------------------------------------------------------------- Somewhere in a Royal Hotel room there's a guy who's starting to realize that eternal fate has turned its back on him ... It's 2AM. Golden Earring ----------------------------------------------------------------------- From mccalpin@perelandra.cms.udel.edu Fri May 6 16:50:08 1994 Received: from sgigate.SGI.COM by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA24818; Fri, 6 May 94 15:45:16 -0400 Received: from relay.sgi.com (relay.sgi.com [192.26.51.36]) by sgigate.sgi.com (8.6.4/8.6.4) with SMTP id MAA15067; Fri, 6 May 1994 12:46:05 -0700 Received: from phil.trevose.sgi.com by relay.sgi.com via SMTP (920330.SGI/920502.SGI) for @sgigate.sgi.com:mccalpin@perelandra.cms.udel.edu id AA10791; Fri, 6 May 94 12:46:00 -0700 Received: from manic.trevose.sgi.com by phil.trevose.sgi.com via SMTP (931110.SGI/930416.SGI) for @relay.sgi.com:mccalpin@perelandra.cms.udel.edu id AA16422; Fri, 6 May 94 15:51:02 -0400 Received: by manic.trevose.sgi.com (931110.SGI/930416.SGI) for @phil.trevose.sgi.com:mccalpin@perelandra.cms.udel.edu id AA18213; Fri, 6 May 94 15:51:01 -0400 From: "Rusty Benson" Message-Id: <9405061551.ZM18211@manic.trevose.sgi.com> Date: Fri, 6 May 1994 15:51:00 -0400 In-Reply-To: "John D. McCalpin" "Re: stream benchmark numbers" (May 6, 3:28pm) References: X-Mailer: Z-Mail-SGI (3.0S.1026 26oct93 MediaMail) To: "John D. McCalpin" Subject: Re: stream benchmark numbers Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Status: RO John, I forgot to mention one thing, the cases that were > 1 cpu were run on a challenge XL with 32 processors and 1 GB of memory. The machine also was not dedicated, there was a GAUSSIAN and a CHARMM job running. I will send the results of the qgbox stuff sometime next week (if we're lucky maybe this weekend). At that time, I will submit the paperwork for the runs on TFP. Rusty -- ----------------------------------------------------------------------- Rusty Benson - Systems Engineer benson@sgi.com Silicon Graphics Computer Systems 8 Neshaminy Interplex, Suite 115 Trevose PA 19053 (215) 638-3707 ----------------------------------------------------------------------- Somewhere in a Royal Hotel room there's a guy who's starting to realize that eternal fate has turned its back on him ... It's 2AM. Golden Earring ----------------------------------------------------------------------- From mccalpin@perelandra.cms.udel.edu Wed May 25 09:13:47 1994 Received: from timbuk.cray.com by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA12664; Tue, 24 May 94 14:30:43 -0400 Received: from lotus04.benchmark (lotus04.cray.com) by cray.com (Bob mailer 1.2) id AA04374; Tue, 24 May 94 13:32:22 CDT Received: by lotus04.benchmark (5.0/SMI-SVR4) id AA09648; Tue, 24 May 1994 13:31:43 +0600 From: cmg@lotus04.cray.com (Charles Grassl) Message-Id: <9405241831.AA09648@lotus04.benchmark> Subject: CRAY T3D To: mccalpin (John D. McCalpin) Date: Tue, 24 May 1994 13:31:42 -0500 (CDT) In-Reply-To: from "John D. McCalpin" at Oct 29, 93 11:35:51 am X-Mailer: ELM [version 2.4 PL23] Content-Type: text Content-Length: 713 Status: RO John; The STREAM benchmark SAXPY result for the CRAY T3D was generated with the library version of the LINPACK SAXPY. Is this OK as far as comparison with other systems? Do you specifiy that libraries not be used? Regards, -- Charles Grassl Cray Research, Inc. cmg@cray.com (612) 683-3531 P.S. I am now a father, too. You have mentioned your child (daughter?) several times and I know how proud you are, so I thought you might be interested. My little daughter came as too much of a surprise: she is three months premature. We have our problems cut out for us, but she is in a very good childrens hospital in Minneapolis. We hope that she will be alright and can come home in two or three months. From David.Daniel@meiko.com Mon Jun 6 15:54:18 1994 Received: from marge.meiko.com by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA00627; Mon, 6 Jun 94 15:54:18 -0400 Received: from abe76.meiko.com (abe.meiko.com) by marge.meiko.com with SMTP id AA07422 (5.65c/IDA-1.4.4 for ); Mon, 6 Jun 1994 15:56:35 -0400 Date: Mon, 6 Jun 1994 15:56:35 -0400 From: David Daniel Message-Id: <199406061956.AA07422@marge.meiko.com> Received: by abe76.meiko.com (5.0/SMI-SVR4) id AA07631; Mon, 6 Jun 94 15:59:10 EDT To: mccalpin Subject: memory bandwith results Cc: marc@access.digex.net, Charles.Wilson@meiko.com Content-Length: 2791 Status: RO X-Status: John, Attached below are the results of your stream memory bandwidth tests for the CS-2 vector node. Let me know if you need more info on this, and I'll be in touch again with the qgbench results shortly. David Daniel ---------------------------------------------------------------------- Meiko Email: David.Daniel@meiko.com 130C Baker Avenue Ext. Tel: 508-371-0088 Concord, MA 01742, USA Fax: 508-371-7516 ---------------------------------------------------------------------- Hardware ******** MK403 compute board incorporating: 66 MHz Pinnacle SPARC. 45 MHz MK534 vector unit (2 VPUs + cache coherency hardware). 16 bank, 3 port, 128MB memory system. The VPUs can request 1 double precision word per cycle, so peak theoretical memory bandwidth in vector code is 2 * 45 MHz * 8 bytes = 720 MB/s. We achieve 85% of this for your benchmark. This is typical of the code generated by the vectorizing compilers -- about 15% goes to stripmine overhead. Software ******** Solaris 2.1 + Meiko support for VPUs. Portland Group vectorizing f77 and cc compilers. Changes to code *************** I replaced the second function with a call to the Solaris gettimeofday, which I believe gives sufficient resolution. No other changes were made. ======================================================================== $ cat second.c #include double second_ (void) { static struct timeval tp; gettimeofday (&tp); return (double) tp.tv_sec + (double) tp.tv_usec * 1.0e-6; } ======================================================================== Log of session ************** ======================================================================== $ date Mon Jun 6 13:03:55 EDT 1994 $ uname -a SunOS abe1 5.1 Callisto_Development dino1 sparc $ cd ~/Delaware/stream $ make pgcc -I /opt/MEIKOcs2/include -c second.c pgf77 -O4 -r8 -Mvect -o stream_d stream_d.f second.o Linking: $ size stream_d 149439 + 67580 + 48005400 = 48222419 $ stream_d -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 20.64710855484009 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 621.2748 0.0516 0.0515 0.0516 Scaling : 621.3108 0.0517 0.0515 0.0535 Summing : 610.2757 0.0787 0.0787 0.0788 SAXPYing : 610.3460 0.0787 0.0786 0.0787 ======================================================================== From mccalpin@axposf.pa.dec.com Fri Jun 17 07:57:39 1994 Received: from axposf.pa.dec.com by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA27386; Fri, 17 Jun 94 07:57:39 -0400 Received: by axposf.pa.dec.com; (5.65/1.1.8.2/21Apr94-0333PM) id AA06121; Fri, 17 Jun 1994 05:00:17 -0700 Date: Fri, 17 Jun 1994 05:00:17 -0700 From: John D. McCalpin Message-Id: <9406171200.AA06121@axposf.pa.dec.com> To: mccalpin Subject: DEC 4000/710 stream_d Status: RO axposf.pa.dec.com> f77 -O stream_d.f -o stream_d axposf.pa.dec.com> uptime ; stream_d ; uptime 04:58 up 15 days, 9:33, 13 users, load average: 1.00, 1.03, 1.07 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 130.784004181623 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 76.4263 0.4230 0.4187 0.4480 Scaling : 74.6854 0.4323 0.4285 0.4363 Summing : 72.7520 0.6649 0.6598 0.6715 SAXPYing : 69.3657 0.6948 0.6920 0.6969 04:59 up 15 days, 9:33, 13 users, load average: 2.00, 1.80, 1.57 Used Fortran etime() for timing. From mccalpin@malacandra Sun Jun 19 08:59:30 1994 Received: from malacandra.cms.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA01072; Sun, 19 Jun 94 08:59:30 -0400 Received: by malacandra.cms.udel.edu (AIX 3.2/UCB 5.64/4.03) id AA16072; Sun, 19 Jun 1994 08:58:51 -0400 Date: Sun, 19 Jun 1994 08:58:51 -0400 From: mccalpin@malacandra Message-Id: <9406191258.AA16072@malacandra.cms.udel.edu> To: mccalpin Subject: Stream on RS/6000-250 Status: RO stream_d -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 55.0000011920928955 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 71.1112 .1871 .1800 .1900 Scaling : 67.3686 .1910 .1900 .2000 Summing : 71.1111 .2700 .2700 .2700 SAXPYing : 71.1112 .2710 .2700 .2800 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 55.9999993070960045 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 71.1112 .1880 .1800 .1900 Scaling : 67.3686 .1910 .1900 .2000 Summing : 73.8461 .2690 .2600 .2700 SAXPYing : 73.8462 .2690 .2600 .2700 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 55.0000002607703209 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 67.3686 .1900 .1900 .1900 Scaling : 67.3686 .1900 .1900 .1900 Summing : 73.8461 .2690 .2600 .2700 SAXPYing : 73.8462 .2680 .2600 .2700 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 56.0000002384185791 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 71.1112 .1871 .1800 .1900 Scaling : 67.3686 .1920 .1900 .2000 Summing : 73.8462 .2691 .2600 .2800 SAXPYing : 71.1112 .2700 .2700 .2700 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 55.0000011920928955 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 71.1112 .1890 .1800 .1900 Scaling : 71.1111 .1880 .1800 .1900 Summing : 73.8462 .2690 .2600 .2700 SAXPYing : 73.8462 .2680 .2600 .2700 stream_s -------------------------------------- Single precision appears to have 7 digits of accuracy Assuming 4 bytes per default REAL word -------------------------------------- Timing calibration ; time = 67.00000000 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 58.1820 .2200 .2200 .2200 Scaling : 51.2000 .2681 .2500 .2700 Summing : 53.3333 .3690 .3600 .3700 SAXPYing : 53.3333 .3690 .3600 .3700 -------------------------------------- Single precision appears to have 7 digits of accuracy Assuming 4 bytes per default REAL word -------------------------------------- Timing calibration ; time = 67.00000000 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 58.1820 .2200 .2200 .2200 Scaling : 49.2308 .2650 .2600 .2700 Summing : 51.8919 .3720 .3700 .3800 SAXPYing : 51.8919 .3730 .3700 .3800 -------------------------------------- Single precision appears to have 7 digits of accuracy Assuming 4 bytes per default REAL word -------------------------------------- Timing calibration ; time = 67.00000000 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 58.1818 .2270 .2200 .2300 Scaling : 51.2000 .2590 .2500 .2600 Summing : 51.8919 .3700 .3700 .3700 SAXPYing : 51.8919 .3720 .3700 .3800 -------------------------------------- Single precision appears to have 7 digits of accuracy Assuming 4 bytes per default REAL word -------------------------------------- Timing calibration ; time = 67.00000000 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 58.1820 .2280 .2200 .2300 Scaling : 49.2308 .2610 .2600 .2700 Summing : 53.3332 .3690 .3600 .3700 SAXPYing : 53.3334 .3680 .3600 .3700 -------------------------------------- Single precision appears to have 7 digits of accuracy Assuming 4 bytes per default REAL word -------------------------------------- Timing calibration ; time = 67.00000000 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 58.1820 .2200 .2200 .2200 Scaling : 49.2307 .2690 .2600 .2700 Summing : 51.8919 .3700 .3700 .3700 SAXPYing : 51.8919 .3700 .3700 .3700 stream_wall -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 56.0000002384185791 hundredths of a second Wall Time ; time = 55.7306051254272461 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 68.1130 .1884 .1879 .1897 Scaling : 67.2633 .1907 .1903 .1911 Summing : 71.3460 .2697 .2691 .2700 SAXPYing : 71.4296 .2694 .2688 .2702 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 56.0000002384185791 hundredths of a second Wall Time ; time = 55.5277109146118164 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 68.1642 .1881 .1878 .1886 Scaling : 67.2792 .1907 .1903 .1912 Summing : 71.3667 .2712 .2690 .2853 SAXPYing : 71.4054 .2695 .2689 .2702 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 56.0000002384185791 hundredths of a second Wall Time ; time = 55.4084062576293945 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 68.1656 .1883 .1878 .1889 Scaling : 67.2597 .1906 .1903 .1911 Summing : 71.3381 .2697 .2691 .2705 SAXPYing : 71.4349 .2693 .2688 .2698 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 56.9999992847442627 hundredths of a second Wall Time ; time = 56.4731001853942871 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 68.1587 .1884 .1878 .1890 Scaling : 67.2894 .1906 .1902 .1913 Summing : 71.3312 .2700 .2692 .2717 SAXPYing : 71.4063 .2695 .2689 .2703 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 56.0000002384185791 hundredths of a second Wall Time ; time = 55.5517911911010742 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 68.2627 .1882 .1875 .1888 Scaling : 67.2597 .1909 .1903 .1927 Summing : 71.3081 .2730 .2693 .3031 SAXPYing : 71.3938 .2695 .2689 .2700 From mccalpin@mozart.udel.edu Thu Jun 30 08:58:09 1994 Received: from mozart.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA22570; Thu, 30 Jun 94 08:58:09 -0400 Received: by mozart.udel.edu (AIX 3.2/UCB 5.64/4.03) id AA16762; Thu, 30 Jun 1994 08:54:24 -0400 Date: Thu, 30 Jun 1994 08:54:24 -0400 From: mccalpin@mozart.udel.edu (John McCalpin) Message-Id: <9406301254.AA16762@mozart.udel.edu> To: mccalpin Subject: stream on 990 Status: RO -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 81.0000002384185791 hundredths of a second Wall Time ; time = 81.1749935150146484 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 652.6021 .0992 .0981 .1033 Scaling : 532.5524 .1208 .1202 .1225 Summing : 710.0853 .1361 .1352 .1376 SAXPYing : 709.0419 .1361 .1354 .1371 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 77.9999971389770508 hundredths of a second Wall Time ; time = 78.6354064941406250 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 662.6702 .0982 .0966 .1018 Scaling : 533.1336 .1218 .1200 .1296 Summing : 714.5461 .1364 .1344 .1484 SAXPYing : 713.8241 .1377 .1345 .1495 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 79.0000021457672119 hundredths of a second Wall Time ; time = 78.3833980560302734 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 663.4457 .0998 .0965 .1041 Scaling : 533.3911 .1231 .1200 .1252 Summing : 711.7856 .1386 .1349 .1421 SAXPYing : 712.7482 .1395 .1347 .1427 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 79.0000012144446373 hundredths of a second Wall Time ; time = 80.7206034660339355 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 654.3703 .0991 .0978 .1005 Scaling : 524.3792 .1237 .1220 .1255 Summing : 709.5923 .1379 .1353 .1451 SAXPYing : 709.2823 .1381 .1353 .1420 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 79.0000021457672119 hundredths of a second Wall Time ; time = 78.2535076141357422 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 661.0343 .2489 .0968 .6950 Scaling : 532.8537 .1647 .1201 .2477 Summing : 711.4328 .2343 .1349 .4728 SAXPYing : 712.0921 .1912 .1348 .2992 From cherkus@UniMaster.COM Fri Jul 15 22:48:44 1994 Received: from fastball.UniMaster.COM by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA03309; Fri, 15 Jul 94 22:48:44 -0400 Received: (cherkus@localhost) by fastball.unimaster.com (8.6.7/8.5) id WAA03915; Fri, 15 Jul 1994 22:47:47 -0400 From: Dave Cherkus Message-Id: <199407160247.WAA03915@fastball.unimaster.com> Subject: STREAM results (C versions...) To: mccalpin (John D. McCalpin) Date: Fri, 15 Jul 1994 22:47:42 -0400 (EDT) In-Reply-To: <9407160140.AA03205@perelandra.cms.udel.edu> from "John D. McCalpin" at Jul 15, 94 09:40:38 pm Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 3580 Status: RO I was interested in seeing what the difference in memory performance between PCs and workstations were, so I ran your benchmark on a Dell 486DX2/66 running SVR4 Unix. I ran each 3 times and the output is below. [necsrv] (cherkus): ./cstream Timing calibration ; time = 2550.00 usec. Increase the size of the arrays if this is < 300 and your clock precision is =< 1/100 second. --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 27.935 642.729 600.000 700.000 Scaling : 20.951 833.217 800.000 870.000 Summing : 23.944 1119.821 1050.000 1260.000 SAXPYing : 21.306 1223.491 1180.000 1310.000 (/local/home/cherkus/bench) [necsrv] (cherkus): ./cstream Timing calibration ; time = 2470.00 usec. Increase the size of the arrays if this is < 300 and your clock precision is =< 1/100 second. --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 27.034 638.326 620.000 680.000 Scaling : 20.440 836.062 820.000 850.000 Summing : 23.718 1083.157 1060.000 1130.000 SAXPYing : 21.488 1190.109 1170.000 1220.000 (/local/home/cherkus/bench) [necsrv] (cherkus): ./cstream Timing calibration ; time = 2590.00 usec. Increase the size of the arrays if this is < 300 and your clock precision is =< 1/100 second. --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 27.477 624.356 610.000 670.000 Scaling : 20.440 846.298 820.000 890.000 Summing : 23.718 1106.644 1060.000 1190.000 SAXPYing : 21.488 1193.168 1170.000 1230.000 (/local/home/cherkus/bench) [necsrv] (cherkus): ./stream_d Timing calibration ; time = 3510000.000000 usec. Increase the size of the arrays if this is < 300000 and your clock precision is =< 1/100 second. --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 33.333 1236499.125 480000.000 1610000.000 Scaling : 15.385 1500983.000 1040000.000 2110000.000 Summing : 22.018 1441853.000 1090000.000 2190000.000 SAXPYing : 18.321 1753890.000 1310000.000 2410000.000 (/local/home/cherkus/bench) [necsrv] (cherkus): ./stream_d Timing calibration ; time = 3630000.000000 usec. Increase the size of the arrays if this is < 300000 and your clock precision is =< 1/100 second. --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 33.333 1070373.750 480000.000 1650000.000 Scaling : 15.534 1195926.375 1030000.000 1950000.000 Summing : 21.818 1453344.500 1100000.000 2230000.000 SAXPYing : 18.750 1945517.875 1280000.000 2450000.000 (/local/home/cherkus/bench) [necsrv] (cherkus): ./stream_d Timing calibration ; time = 2660000.000000 usec. Increase the size of the arrays if this is < 300000 and your clock precision is =< 1/100 second. --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 33.333 1437685.625 480000.000 1550000.000 Scaling : 16.495 1559054.250 970000.000 1980000.000 Summing : 21.818 1438527.125 1100000.000 2230000.000 SAXPYing : 18.750 1483600.375 1280000.000 2350000.000 -- Dave Cherkus UniMaster, Inc. cherkus@unimaster.com From robertl@cwi.nl Tue Jul 19 09:09:08 1994 Received: from [192.16.184.142] by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA13453; Tue, 19 Jul 94 09:09:08 -0400 Received: from paling.cwi.nl by charon.cwi.nl with SMTP id ; Tue, 19 Jul 1994 15:07:53 +0200 Received: by paling.cwi.nl with SMTP id AA03173 (931110.SGI/3.8/CWI-Amsterdam); Tue, 19 Jul 94 15:07:52 +0200 Message-Id: <9407191307.AA03173=robertl@paling.cwi.nl> To: mccalpin Cc: robertl@cwi.nl Subject: streams on Challange Date: Tue, 19 Jul 1994 15:07:51 +0200 From: Robert van Liere Status: RO John, [For what its worth...] Here are some results of your streams program on a multi-processor Challange @ 150 Mhz running IRIX 5.2. Included are three cases: - sequential code - parallel code -- compiled with -pfa, the POWER FORTRAN accelerator. -- on a 2 CPU Challange -- on a 6 CPU Challange The -pfa does all sorts of things like loop unrolling, some data dependency analysis, etc. (you probably know more abt this than i). I have the impression that (some) improvement can be made by finding better compiler options. Cheers, -- Robert van Liere Included for each of three cases is: - compiler options - uname -a - hardware inventory | grep 150 - a.out output 1. -------------------------------------------------------------------- ; f77 -mips2 -O3 -non_shared stream_d.f ; uname -a IRIX artemis 5.2 02282015 IP19 mips ; a.out -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 159.0000024065375 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 62.7451 0.5231 0.5100 0.5400 Scaling : 58.1819 0.5520 0.5500 0.5600 Summing : 63.1580 0.7700 0.7600 0.7800 SAXPYing : 70.5882 0.6920 0.6800 0.7000 2. -------------------------------------------------------------------- ; f77 -pfa -mips2 -O3 -non_shared stream_d.f ; hinv | grep 150 2 150 MHZ IP19 Processors ; uname -a IRIX artemis 5.2 02282015 IP19 mips ; a.out -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 178.0000081285834 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 133.3337 0.2461 0.2400 0.2600 Scaling : 139.1304 0.2441 0.2300 0.2500 Summing : 133.3338 0.3721 0.3600 0.3800 SAXPYing : 141.1764 0.3551 0.3400 0.3600 3. -------------------------------------------------------------------- ; f77 -pfa -mips2 -O3 -non_shared stream_d.f ; hinv | grep 150 6 150 MHZ IP19 Processors ; uname -a IRIX zeus 5.2 02282015 IP19 mips ; a.out -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 80.99999930709600 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 355.5568 0.1018 0.0900 0.1300 Scaling : 355.5559 0.1015 0.0900 0.1200 Summing : 369.2308 0.1444 0.1300 0.1600 SAXPYing : 369.2304 0.1467 0.1300 0.1700 -------------------------------------------------- Robert van Liere robertl@cwi.nl From mccalpin@perelandra.cms.udel.edu Thu Jul 21 08:33:39 1994 Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA18752; Thu, 21 Jul 94 07:46:30 -0400 Received: from inet-gw-2.pa.dec.com (inet-gw-2.pa.dec.com [16.1.0.23]) by bach.udel.edu (8.6.8/8.6.6) with SMTP id HAA00174 for ; Thu, 21 Jul 1994 07:46:55 -0400 Received: from us1rmc.bb.dec.com by inet-gw-2.pa.dec.com (5.65/27May94) id AA17153; Thu, 21 Jul 94 02:52:25 -0700 Received: from i4get.enet by us1rmc.bb.dec.com (5.65/rmc-22feb94) id AA29681; Thu, 21 Jul 94 05:52:21 -0400 Message-Id: <9407210952.AA29681@us1rmc.bb.dec.com> Received: from i4get.enet; by us1rmc.enet; Thu, 21 Jul 94 05:52:22 EDT Date: Thu, 21 Jul 94 05:52:22 EDT From: "John, dtn 293-5506 21-Jul-1994 0552" To: mccalpin@bach.udel.edu Cc: me@i4get.enet.dec.com Apparently-To: mccalpin@bach.udel.edu Subject: Questions on STREAM Benchmark table Status: RO Dear Mr. McCalpin, Thank you for the memory information that you have published -- it is very interesting and educational. I have been looking at the copy of your STREAM table dated Wed May 25 09:42:20 1994 and have two questions: 1) Is there a more recent version? I looked on comp.benchmarks, but didn't see one. If there is, an e-mail copy would be very kind. 2) Perhaps the copy I have is missing some information; but when I look at the table with "speedup ratios", it doesn't always seem to map back to the raw data in the first table. For Cray C90 there are speedups reported for 1,2,4,8,16 CPUs, and sure enough in the first table I can see the raw data for all of these. But for the SGI Challenge 150 Mhz there is a report of 1,4,8 CPUs and I can't see the corresponding raw data. I suppose I could try to derive it, for example by multiplying the 8-CPU "SGI Challenge 150MHz" Copy speedup of 7.10 by the Copy bandwidth in the base table, but that's not easy since I'm not sure which base table entry to select: SGI Challenge 150 Mhz 8 58.2 SGI Challenge 150 Mhz 4 66.7 SGI Challenge 1 cpu 8 57.1 SGI Challenge 1 cpu 4 53.3 The Sun SparcCenter 2000 likewise has more CPU entries in the second table than in the first. Thanks very much for your help /John Henning CSG Performance Group Digital henning@msbcs.enet.dec.com From mccalpin@perelandra.cms.udel.edu Thu Jul 21 08:34:23 1994 Received: from strauss.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/931108.SGI.ANONFTP) for mccalpin id AA18852; Thu, 21 Jul 94 08:18:58 -0400 Received: (from mccalpin@localhost) by strauss.udel.edu (8.6.8/8.6.6) id IAA06830 for mccalpin@perelandra.cms.udel.edu; Thu, 21 Jul 1994 08:19:29 -0400 Date: Thu, 21 Jul 1994 08:19:29 -0400 From: John D McCalpin Message-Id: <199407211219.IAA06830@strauss.udel.edu> To: mccalpin Subject: Stream on SC/2000 - raw data Status: RO Raw results for a single copy (P1.1): -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 569 hundredths of a second Wall Time ; time = 570.333 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 33.3704 1.0088 0.9589 1.3007 Scaling : 35.2505 0.9369 0.9078 0.9763 Summing : 36.2480 1.4251 1.3242 1.9608 SAXPYing : 35.8991 1.3970 1.3371 1.5714 Results for two copies (P2.1 and P2.2) follow: -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 848 hundredths of a second Wall Time ; time = 864.167 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 33.2178 0.9844 0.9633 1.0188 Scaling : 34.1472 0.9532 0.9371 0.9791 Summing : 35.4462 1.3762 1.3542 1.4258 SAXPYing : 35.2128 1.3879 1.3631 1.4181 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 733 hundredths of a second Wall Time ; time = 789.167 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 33.2579 0.9846 0.9622 1.0101 Scaling : 34.0676 0.9732 0.9393 1.0285 Summing : 35.6316 1.5219 1.3471 2.3657 SAXPYing : 34.7598 1.3987 1.3809 1.4217 Results for 4 copies (P4.out) follows: -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 1087 hundredths of a second Wall Time ; time = 1257.08 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 29.9418 1.3021 1.0687 1.7687 Scaling : 31.9710 1.3422 1.0009 2.1995 Summing : 30.4404 1.7988 1.5769 2.3239 SAXPYing : 32.0865 1.9042 1.4960 2.8321 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 1084 hundredths of a second Wall Time ; time = 1324.17 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 30.1089 1.3002 1.0628 2.0462 Scaling : 31.3150 1.3510 1.0219 1.9828 Summing : 32.6054 1.8798 1.4721 2.8029 SAXPYing : 32.8456 1.7848 1.4614 2.3825 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 1066 hundredths of a second Wall Time ; time = 1232.25 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 30.9314 1.2966 1.0345 1.8223 Scaling : 31.7224 1.1815 1.0088 1.4985 Summing : 31.8746 1.9749 1.5059 3.2586 SAXPYing : 31.6472 1.8541 1.5167 2.7178 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 1079 hundredths of a second Wall Time ; time = 1287.42 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 31.3683 1.2432 1.0201 1.7260 Scaling : 30.4203 1.2905 1.0519 1.7533 Summing : 32.9644 1.8274 1.4561 2.6191 SAXPYing : 31.8586 1.9232 1.5067 2.8726 Results for 6 copies (P6.out) follows: -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 1024 hundredths of a second Wall Time ; time = 1410.67 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 26.0079 1.3295 1.2304 1.4418 Scaling : 28.2383 1.3288 1.1332 1.7055 Summing : 28.0898 1.9766 1.7088 2.2266 SAXPYing : 28.7659 1.9657 1.6686 2.3684 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 1010 hundredths of a second Wall Time ; time = 1405.08 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 27.0444 1.4051 1.1832 1.7970 Scaling : 27.9884 1.3240 1.1433 1.5381 Summing : 29.4722 1.8842 1.6287 2.1229 SAXPYing : 29.0939 1.9961 1.6498 2.4653 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 1012 hundredths of a second Wall Time ; time = 1440.75 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 27.6242 1.4153 1.1584 1.9580 Scaling : 27.1891 1.4089 1.1769 1.7027 Summing : 28.6485 1.8387 1.6755 2.1152 SAXPYing : 27.5635 1.9043 1.7414 2.2844 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 1022 hundredths of a second Wall Time ; time = 1481.00 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 28.4723 1.3335 1.1239 1.6311 Scaling : 27.2368 1.3393 1.1749 1.5346 Summing : 28.2188 1.9001 1.7010 2.1014 SAXPYing : 27.3734 1.8806 1.7535 2.0705 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 976 hundredths of a second Wall Time ; time = 1352.67 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 27.7207 1.4028 1.1544 1.7443 Scaling : 28.5197 1.3388 1.1220 1.6825 Summing : 28.6751 1.9987 1.6739 2.3020 SAXPYing : 28.3616 1.8661 1.6924 2.3033 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 945 hundredths of a second Wall Time ; time = 1314.33 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 27.6434 1.3173 1.1576 1.4869 Scaling : 27.2736 1.3789 1.1733 1.7669 Summing : 29.0857 1.9833 1.6503 2.3564 SAXPYing : 28.5905 1.8968 1.6789 2.1670 From mccalpin@perelandra.cms.udel.edu Tue Aug 2 07:38:36 1994 Received: from [144.110.16.11] by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI) for mccalpin id AA05385; Mon, 1 Aug 94 01:40:08 -0400 Received: by shark.mel.dit.csiro.au id AA12859 (5.65c/IDA-1.4.4/DIT-1.3 for mccalpin@perelandra.cms.udel.edu); Mon, 1 Aug 1994 15:40:49 +1000 From: Robert Bell Message-Id: <199408010540.AA12859@shark.mel.dit.csiro.au> Subject: Stream benchmark To: mccalpin Date: Mon, 1 Aug 1994 15:40:48 +1000 (EST) Cc: csrcb@mel.dit.csiro.au (Robert Bell) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 3580 Status: RO John, I have been following your stream benchmark, and your advocacy of the importance of memory speeds. I have some of my own tests which look particularly at strides, but I have not had time to develop them much. I have recently run your test on a Cray Y-MP 4E/464, under UNICOS 7.0.5.3, and found that the results were not as expected. (I implemented the timing code by removing 'second' from the external declaration, and supplying no external routine, so the Cray intrinsic was used.) Here are the results with cf77 -V -O vector3 -o stream stream.orig.f Cray CF77 Version 6.0 (6.55) 08/01/94 15:36:00 Cray FPP Version 6.0 (3.06F1) 08/01/94 15:36:00 cft77-42 cf77: Cray CFT77 Version 6.0.3.9 (008708) 08/01/94 15:36:01 cft77-2 cf77: COMPILE TIME .669 SECONDS cft77-6 cf77: MAXIMUM FIELD LENGTH 421358 DECIMAL WORDS cft77-3 cf77: 236 SOURCE LINES cft77-4 cf77: 0 ERROR, 0 WARNING, 0 CAUTION, 0 COMMENT, 0 NOTE, 0 ANSI cft77-5 cf77: CODE: 567 WORDS, DATA: 281 WORDS -------------------------------------- Single precision appears to have 14 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 0.1128762 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment:********** 0.0000 0.0000 0.0000 Scaling :********** 0.0000 0.0000 0.0000 Summing :********** 0.0000 0.0000 0.0000 SAXPYing :********** 0.0000 0.0000 0.0000 It appears that the compiler was smart enough to optimise away some of your code. I changed some of the loops as follows do 30 j=1,N C Original. c(j) = 3.0e0*a(j) b(j) = 3.0e0*c(j) 30 continue do 40 j=1,N C Original. c(j) = a(j)+b(j) a(j) = b(j)+c(j) 40 continue The results were then -------------------------------------- Single precision appears to have 14 digits of accuracy Assuming 8 bytes per default REAL word -------------------------------------- Timing calibration ; time = 1.9692438 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 2323.1070 0.0069 0.0069 0.0070 Scaling : 2354.6871 0.0069 0.0068 0.0070 Summing : 2365.9413 0.0105 0.0101 0.0107 SAXPYing : 2365.2446 0.0105 0.0101 0.0108 These look correct for a 2-section Y-MP. (The results in your table are for a 4-section Y-MP.) Note that the setting of c can still be removed from the do 10 loop, and that the values of c should be used after the do 50 loop, to prevent optimisers from taking further liberties with your code. Also, it is good to avoid the first value returned by some of the timing routines on some systems - the first value includes some setup time. -- Regards Rob. Bell ( email: csrcb@mel.dit.csiro.au ) -- / Robert.Bell@mel.dit.csiro.au | CSIRO Supercomputing Facility Manager \ | CSIRO Division of Information Technology, Supercomputing Support Group | | 723 Swanston Street | tel: {+61 3,03,} 282 2620 or {+61,0} 18 108 333 | \ Carlton VIC 3053 Australia | fax: {+61 3,03,} 282 2600 / From gary@chpc.utexas.edu Tue Aug 2 13:11:02 1994 Received: from bach.udel.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI) for mccalpin id AA08667; Tue, 2 Aug 94 13:11:02 -0400 Received: from gonzales.chpc.utexas.edu (gonzales.chpc.utexas.edu [129.116.16.12]) by bach.udel.edu (8.6.8/8.6.6) with SMTP id NAA28021 for ; Tue, 2 Aug 1994 13:11:48 -0400 From: gary@chpc.utexas.edu Received: by gonzales.chpc.utexas.edu (5.0/SMI-4.1) id AA13698; Tue, 2 Aug 1994 12:11:45 +0600 Message-Id: <9408021711.AA13698@gonzales.chpc.utexas.edu> Subject: SP2 node results To: mccalpin@bach.udel.edu Date: Tue, 2 Aug 1994 12:11:42 -0500 (CDT) X-Mailer: ELM [version 2.4 PL23] Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 698 Status: RO Dr. McCalpin, Do you have your stream benchmark results for the IBM 390 "thin" node, as well as the IBM 590 "wide" node option for their SP2 system? Since the 590 path to memory is 256 bits wide, and that of the 390 is 64 bits, I assume the 390 has 1/4 of the sustainable CPU<->Dcache<->DRAM band- pass that the 590 has, since both are clocked at 66MHz, but I'd love to see an actual measurement... ---Gary Randolph Gary Smith Internet: gary@chpc.utexas.edu Systems Group Phonenet: (512) 471-2411 Center for High Performance Computing Snailnet: 10100 Burnet Road The University of Texas System Austin, Texas 78758-4497 From mccalpin Tue Aug 2 13:39:27 1994 Received: by perelandra.cms.udel.edu (931110.SGI/930416.SGI) for mccalpin id AA08776; Tue, 2 Aug 94 13:39:27 -0400 Date: Tue, 2 Aug 94 13:39:27 -0400 From: mccalpin (John D. McCalpin) Message-Id: <9408021739.AA08776@perelandra.cms.udel.edu> To: mccalpin Subject: Stream on grieg Status: RO ================================================================ Machine #1 ================================================================ xlf -O3 -qarch -qstrict stream_d.f -o stream_d measured on grieg.udel.edu, an IBM RS/6000-990 with 2 memory cards --- i.e. a 128-bit bus with 128kB Dcache. John D. McCalpin mccalpin@perelandra.cms.udel.edu Tue Aug 2 13:30:54 EDT 1994 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 333.3333 .2684 .2400 .2900 Scaling : 320.0000 .2691 .2500 .2800 Summing : 342.8568 .3691 .3500 .3800 SAXPYing : 352.9410 .3573 .3400 .3800 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 333.3337 .2755 .2400 .2900 Scaling : 333.3337 .2706 .2400 .3000 Summing : 363.6364 .3634 .3300 .3800 SAXPYing : 352.9410 .3622 .3400 .3800 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 333.3337 .2533 .2400 .2800 Scaling : 320.0000 .2592 .2500 .2800 Summing : 363.6364 .3442 .3300 .3700 SAXPYing : 363.6364 .3553 .3300 .3700 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 333.3333 .2592 .2400 .2700 Scaling : 320.0000 .2612 .2500 .2800 Summing : 363.6363 .3452 .3300 .3700 SAXPYing : 363.6364 .3522 .3300 .3700 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 320.0000 .2662 .2500 .2800 Scaling : 333.3337 .2532 .2400 .2700 Summing : 352.9410 .3561 .3400 .3700 SAXPYing : 352.9415 .3553 .3400 .3800 --------------------------------------------------- ================================================================ Machine #1 ================================================================ xlf -O3 -qarch -qstrict stream_d.f -o stream_d measured on mozart.udel.edu, an IBM RS/6000-990 with 4 memory cards --- i.e. a 256-bit bus with 256kB Dcache. John D. McCalpin mccalpin@perelandra.cms.udel.edu Tue Aug 2 13:30:54 EDT 1994 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 666.6673 .1332 .1200 .1400 Scaling : 533.3347 .1551 .1500 .1600 Summing : 705.8820 .1892 .1700 .2000 SAXPYing : 705.8820 .1851 .1700 .1900 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 727.2726 .1318 .1100 .1600 Scaling : 615.3846 .1643 .1300 .2000 Summing : 749.9996 .1885 .1600 .2100 SAXPYing : 750.0007 .1866 .1600 .2100 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 727.2718 .1275 .1100 .1400 Scaling : 615.3841 .1549 .1300 .1900 Summing : 749.9996 .1837 .1600 .2100 SAXPYing : 750.0007 .1786 .1600 .2100 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 727.2718 .1296 .1100 .1500 Scaling : 615.3841 .1535 .1300 .1800 Summing : 799.9995 .1850 .1500 .2100 SAXPYing : 749.9996 .1834 .1600 .2000 --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 666.6673 .1363 .1200 .1500 Scaling : 571.4291 .1563 .1400 .1800 Summing : 750.0002 .1853 .1600 .2000 SAXPYing : 705.8820 .1852 .1700 .2000 --------------------------------------------------- -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. John.McCalpin@mvs.udel.edu From mccalpin@perelandra.cms.udel.edu Mon Aug 15 13:55:47 1994 Received: from inet-gw-1.pa.dec.com by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI) for mccalpin id AA03693; Mon, 15 Aug 94 12:15:47 -0400 Received: from us1rmc.bb.dec.com by inet-gw-1.pa.dec.com (5.65/10Aug94) id AA16282; Mon, 15 Aug 94 08:58:35 -0700 Received: from i4get.enet by us1rmc.bb.dec.com (5.65/rmc-22feb94) id AA12508; Mon, 15 Aug 94 11:58:39 -0400 Message-Id: <9408151558.AA12508@us1rmc.bb.dec.com> Received: from i4get.enet; by us1rmc.enet; Mon, 15 Aug 94 11:58:39 EDT Date: Mon, 15 Aug 94 11:58:39 EDT From: "John, dtn 293-5506 15-Aug-1994 1123" To: mccalpin Cc: me@i4get.enet.dec.com Apparently-To: mccalpin Subject: .gz ? Status: O Hello Professor McCalpin, I expect to be able to give you some data on additional digital systems in the near future; just need to get approvals before releasing the data. Meanwhile, I've continued to study the data you already have, but have run into a roadblock with the ".gz" files in /bench/stream_mail. Would you be willing to post them in ".Z" format instead? (I gather that .gz is gnu zip? and on my DEC OSF/1 system, neither "uncompress" nor "unpack" is able to read it.) Thanks very much! /john From mike@wavelet.apl.washington.edu Fri Aug 26 12:26:26 1994 Received: from wavelet.apl.washington.edu by perelandra via SMTP (931110.SGI/930416.SGI) for mccalpin id AA01678; Fri, 26 Aug 94 12:26:26 -0400 Received: by wavelet.apl.washington.edu (5.65c) id AA14712; Fri, 26 Aug 1994 09:27:11 -0700 From: Mike Kenney Message-Id: <199408261627.AA14712@wavelet.apl.washington.edu> Subject: stream benchmark results To: mccalpin Date: Fri, 26 Aug 1994 09:27:10 -0700 (PDT) X-Mailer: ELM [version 2.4 PL23] Content-Type: text Content-Length: 772 Status: RO System: Pentium/60 Intel Premiere PCI Motherboard 32Mb RAM OS: Linux v 1.0.9 Compiler: gcc v 2.5.8 Compiler flags: -O2 -m486 -fomit-frame-pointer -funroll-loops Note: This compiler does not optimize for the Pentium Timing calibration ; time = 2870.00 usec. Increase the size of the arrays if this is < 300 and your clock precision is =< 1/100 second. --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 37.246 477.473 450.000 560.000 Scaling : 62.077 283.390 270.000 320.000 Summing : 61.320 424.594 410.000 490.000 SAXPYing : 58.468 441.078 430.000 460.000 -- Mike Kenney UW Applied Physics Lab mikek@apl.washington.edu From mike@wavelet.apl.washington.edu Fri Aug 26 16:13:15 1994 Received: from wavelet.apl.washington.edu by perelandra via SMTP (931110.SGI/930416.SGI) for mccalpin id AA02413; Fri, 26 Aug 94 16:13:15 -0400 Received: by wavelet.apl.washington.edu (5.65c) id AA15248; Fri, 26 Aug 1994 13:14:02 -0700 From: Mike Kenney Message-Id: <199408262014.AA15248@wavelet.apl.washington.edu> Subject: Re: stream benchmark results To: mccalpin (John D. McCalpin) Date: Fri, 26 Aug 1994 13:14:02 -0700 (PDT) In-Reply-To: <9408261632.AA01705@perelandra> from "John D. McCalpin" at Aug 26, 94 12:32:09 pm X-Mailer: ELM [version 2.4 PL23] Content-Type: text Content-Length: 983 Status: RO John D. McCalpin writes: > > Thanks for the data. The rate for a plain copy is surprisingly slow, > but the other numbers are quite good for a machine in that price range. > -- > John D. McCalpin mccalpin@perelandra.cms.udel.edu > Assistant Professor mccalpin@brahms.udel.edu > College of Marine Studies, U. Del. John.McCalpin@mvs.udel.edu > > Looking at the assembler output from gcc, there are two 4-byte moves issued on each iteration of the plain-copy loop. The other loops use the floating-point instructions which can move 8-bytes at a time. I would expect an 8-byte move instruction would approximately double the first loop speed ... maybe a Pentium optimized gcc would do this. BTW, I tried substituting c[j] = 1.0*a[j] in the first loop but the compiler didn't fall for it ... it still issued the 4-byte move instructions rather than an fp multiply. -- Mike Kenney UW Applied Physics Lab mikek@apl.washington.edu From mccalpin@perelandra Fri Sep 9 13:01:47 1994 Received: from inet-gw-1.pa.dec.com by perelandra via SMTP (931110.SGI/930416.SGI) for mccalpin id AA14365; Fri, 9 Sep 94 11:35:28 -0400 Received: from us1rmc.bb.dec.com by inet-gw-1.pa.dec.com (5.65/10Aug94) id AA15199; Fri, 9 Sep 94 08:22:50 -0700 Received: from i4get.enet by us1rmc.bb.dec.com (5.65/rmc-22feb94) id AA25478; Fri, 9 Sep 94 11:22:59 -0400 Message-Id: <9409091522.AA25478@us1rmc.bb.dec.com> Received: from i4get.enet; by us1rmc.enet; Fri, 9 Sep 94 11:23:00 EDT Date: Fri, 9 Sep 94 11:23:00 EDT From: "John, dtn 293-5506 09-Sep-1994 1117" To: mccalpin Cc: shak@i4get.enet.dec.com, me@i4get.enet.dec.com Apparently-To: mccalpin Subject: STREAMS on 2 Alpha SMP systems Status: O Dear Professor McCalpin, The CSG Performance Group at Digital has recently run the Streams benchmark on two Alpha SMP systems. I look forward to seeing an update of your tables that includes these results. If there are any questions we can answer about these runs, please feel free to contact me. I hope you are doing well with the new academic year! Best regards /John Henning 508 264 5506 henning@msbcs.enet.dec.com or henning@zko.mts.dec.com Bytes Bandwidth (MB/s) Machine /word Copy Scale Sum Triad --------------- ----- -------- -------- -------- -------- DEC 7000/640 4-CPU 8 261.9 257.4 261.1 265.9 DEC 7000/620 2-CPU 8 163.9 166.0 161.8 157.1 DEC 7000/610 1-CPU 8 91.6 91.6 88.8 86.9 DEC 2100 A500 2-CPU 8 133.3 133.6 129.6 127.7 DEC 2100 A500 1-CPU 8 79.2 80.8 79.1 79.3 --------------- ----- -------- -------- -------- -------- num Speedup ratio for Machine cpus Copy Scale Sum Triad --------------- ----- -------- -------- -------- -------- DEC 7000/640 4 2.86 2.81 2.94 3.06 DEC 7000/620 2 1.79 1.81 1.82 1.81 DEC 7000/610 1 1.00 1.00 1.00 1.00 DEC 2100 A500 2 1.68 1.65 1.64 1.61 DEC 2100 A500 1 1.00 1.00 1.00 1.00 --------------- ----- -------- -------- -------- -------- 2100_1cpu -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 75.6000041961670 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 79.1954 0.2038 0.2020 0.2143 Scaling : 80.7559 0.1987 0.1981 0.1997 Summing : 79.0681 0.3037 0.3035 0.3045 SAXPYing : 79.3231 0.3034 0.3026 0.3035 2100_2cpu: Add 2 tables together ... jobs running simultaneous -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 145.051205158234 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 66.6400 0.2404 0.2401 0.2411 Scaling : 66.6400 0.2403 0.2401 0.2411 Summing : 64.7110 0.3732 0.3709 0.3900 SAXPYing : 63.8705 0.3768 0.3758 0.3777 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 145.539200305939 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 66.6400 0.2406 0.2401 0.2420 Scaling : 66.9120 0.2400 0.2391 0.2407 Summing : 64.8817 0.3737 0.3699 0.3968 SAXPYing : 63.8706 0.3765 0.3758 0.3777 7600-1CPU -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 57.3487997055054 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 91.5835 0.1795 0.1747 0.2147 Scaling : 91.5835 0.1750 0.1747 0.1757 Summing : 88.7732 0.2705 0.2704 0.2709 SAXPYing : 86.8911 0.2766 0.2762 0.2778 7600-2CPU: Add 2 tables together ... jobs running simultaneous -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 74.3312001228333 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 81.9672 0.1964 0.1952 0.1981 Scaling : 82.7952 0.1943 0.1932 0.1952 Summing : 81.1557 0.2968 0.2957 0.2983 SAXPYing : 78.6658 0.3059 0.3051 0.3070 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 73.4527945518494 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 81.9672 0.1960 0.1952 0.2001 Scaling : 83.2155 0.1932 0.1923 0.1942 Summing : 80.6235 0.2990 0.2977 0.3006 SAXPYing : 78.4150 0.3066 0.3061 0.3070 7600-4CPU: Add 4 tables together ... jobs running simultaneous -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 104.782390594482 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 65.0533 0.2513 0.2460 0.2895 Scaling : 63.5405 0.2525 0.2518 0.2534 Summing : 65.3994 0.3683 0.3670 0.3695 SAXPYing : 70.8650 0.3704 0.3387 0.3754 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 104.587197303772 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 65.4193 0.2517 0.2446 0.3051 Scaling : 64.2880 0.2498 0.2489 0.2504 Summing : 65.5738 0.3672 0.3660 0.3685 SAXPYing : 65.6455 0.3716 0.3656 0.3734 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 100.878405570984 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 65.3125 0.2558 0.2450 0.3305 Scaling : 64.2880 0.2505 0.2489 0.2514 Summing : 65.3994 0.3681 0.3670 0.3689 SAXPYing : 65.1240 0.3729 0.3685 0.3744 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- CPU time ; time = 0.000000000000000E+000 hundredths of a second Wall Time ; time = 96.2911963462830 hundredths of a second Increase the size of the arrays if this is <100 and your clock precision is =<1/100 second --------------------------------------------------- Function Max MB/s RMS time Min time Max time Assignment: 66.1026 0.2565 0.2420 0.3461 Scaling : 65.3125 0.2461 0.2450 0.2508 Summing : 64.7110 0.3720 0.3709 0.3728 SAXPYing : 64.2729 0.3745 0.3734 0.3763 From henning@i4get.enet.dec.com Thu Oct 13 16:05:57 1994 Received: from inet-gw-1.pa.dec.com by perelandra via SMTP (931110.SGI/930416.SGI) for mccalpin id AA07734; Thu, 13 Oct 94 16:05:57 -0400 Received: from us1rmc.bb.dec.com by inet-gw-1.pa.dec.com (5.65/10Aug94) id AA22959; Thu, 13 Oct 94 12:45:31 -0700 Received: from i4get.enet by us1rmc.bb.dec.com (5.65/rmc-22feb94) id AA11379; Thu, 13 Oct 94 15:45:51 -0400 Message-Id: <9410131945.AA11379@us1rmc.bb.dec.com> Received: from i4get.enet; by us1rmc.enet; Thu, 13 Oct 94 15:45:57 EDT Date: Thu, 13 Oct 94 15:45:57 EDT From: "John, dtn 293-5506 13-Oct-1994 1540" To: mccalpin Cc: me@i4get.enet.dec.com Apparently-To: mccalpin Subject: The Digital data... Status: RO Hi Professor McCalpin, Thank you for including the data I submitted not too long ago in your tables. However, in the interest of treating Digital the same as other vendors, may I request that you include the 2-cpu and 4-cpu data in the base table? I notice that you have only: DEC 7000/610 1 cpu 8 91.6 91.6 88.8 86.9 DEC 2100 A500 1 cpu 8 79.2 80.8 79.1 79.3 in table.print, but you have multiple cpu data in table.print for Cray YMP/C90 2 cpu 8 13866.0 13905.5 18233.2 18246.3 Cray EL-98 2 cpu 8 826.7 833.8 1049.0 1078.2 Convex C3420 2 cpu 8 289.1 273.1 302.7 300.3 Convex C3420 2 cpu 4 242.7 238.8 263.7 260.0 Convex C3220 2 cpu 8 334.3 333.0 337.3 336.5 Convex C3220 2 cpu 4 178.7 177.5 211.0 208.8 Sun SS10/41 2 cpu 8 48.0 48.0 48.0 48.0 Stardent P3 3 cpu 8 228.6 246.2 246.2 200.0 Alliant VFX80 3 cpu 8 41.7 41.6 43.1 43.0 Alliant VFX80 3 cpu 4 23.8 23.8 25.2 25.3 Cray YMP/C90 4 cpu 8 27610.3 27789.6 34633.3 35044.1 Cray Y/MP 4 cpu 8 9685.8 9678.9 13781.4 13851.2 Cray EL-98 4 cpu 8 1564.9 1569.8 1933.8 1955.5 FPS MCP728 7 cpu 8 1040.5 1036.8 890.4 887.6 FPS MCP728 7 cpu 4 1046.2 478.0 393.0 394.8 Cray YMP/C90 8 cpu 8 55071.9 55391.8 60843.3 63229.6 Cray Y/MP 8 cpu 8 19291.6 19294.2 26588.9 26802.2 Cray EL-98 8 cpu 8 2362.8 2310.5 2373.7 2363.8 Alliant FX/80 8 cpu 8 72.8 71.6 76.3 76.5 Cray YMP/C90 16 cpu 8 105497.4 104656.4 101736.1 103812.8 Thanks very much for considering this request! /john henning From kupec@agouron.com Fri Oct 28 16:03:05 1994 Received: from tbone.agouron.com by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI) for mccalpin id AA06272; Fri, 28 Oct 94 16:03:05 -0400 Received: from blowhole.agouron.com by agouron.com via SMTP (940715.SGI.52/930901.SGI) for mccalpin@perelandra.cms.udel.edu id AA12039; Fri, 28 Oct 94 13:07:51 -0700 Received: by blowhole.agouron.com (940715.SGI.52/930416.SGI) for @agouron.com:mccalpin@perelandra.cms.udel.edu id AA00884; Fri, 28 Oct 94 13:08:30 -0700 From: "John W. Kupec" Message-Id: <9410281308.ZM882@blowhole.agouron.com> Date: Fri, 28 Oct 1994 13:08:29 -0700 In-Reply-To: mccalpin@perelandra.cms.udel.edu (John D. McCalpin) "Re: stream" (Oct 28, 3:32pm) References: <9410281905.AA24411@irisb> <9410281532.ZM6196@perelandra.cms.udel.edu> X-Mailer: Z-Mail (3.1.0 22feb94 MediaMail) To: mccalpin (John D. McCalpin) Subject: Re: stream Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 Status: RO On Oct 28, 3:32pm, John D. McCalpin wrote: > Subject: Re: stream > On Oct 28, 12:05pm, John W. Kupec wrote: > > Having problems here. I've tried various f77 incantations & > > get "Segmentation fault" under IRIX 6.0 and IRIX 5.2 (R4400). Sure enough, stacksize was 64Mb. Here's the results after compiling "f77 -mips4 -O3 stream_d.f -o stream_d" on Power Challenge-L, IRIX 6.0, 1GB, 2-way interleaved memory: -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 118.6243979260325 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 130.9830 0.4904 0.4886 0.4991 Scaling : 131.8875 0.4871 0.4853 0.4921 Summing : 129.2836 0.7464 0.7426 0.7659 SAXPYing : 129.0843 0.7448 0.7437 0.7462 Sum of a is : 2.3066015625918735E+18 Sum of b is : 4.6132031248564384E+17 Sum of c is : 6.1509375001412557E+17 real 27.26 user 25.37 sys 1.26 There is no -O4 under IRIX 6.0. Here are the results I obtained with "f77 -mips2 -O3 -non_shared" on an Indigo 2, IRIX 5.2, 150Mhz R4400 CPU, 128MB memory: -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 309.9999904632568 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 59.8131 1.1270 1.0700 1.3300 Scaling : 57.1427 1.2077 1.1200 1.4000 Summing : 61.1465 1.7146 1.5700 2.0100 SAXPYing : 67.1329 1.5382 1.4300 1.8700 Sum of a is : 2.3066015623519370E+18 Sum of b is : 4.6132031251484826E+17 Sum of c is : 6.1509374999898381E+17 real 1:02.47 user 54.58 sys 5.66 Would you mind providing comparisions to other machines? Thanks From sgf@cfm.brown.edu Fri Oct 28 16:34:57 1994 Received: from cfm.brown.edu by perelandra.cms.udel.edu via SMTP (931110.SGI/930416.SGI) for mccalpin id AA06436; Fri, 28 Oct 94 16:34:57 -0400 Received: by cfm.brown.edu (4.1/1.00) id AA13480; Fri, 28 Oct 94 16:36:03 EDT Date: Fri, 28 Oct 94 16:36:03 EDT From: sgf@cfm.brown.edu (Sam Fulcomer) Message-Id: <9410282036.AA13480@cfm.brown.edu> To: mccalpin Subject: Re: New MIPS R10000? [actually, more on R8000] Status: RO Well..., here' the results and such... Let me know what you think. I'm now going to get our stuff running and attempt to do some optimization. -s odin 61% hinv 15 75 MHZ IP21 Processors ?? ...hhmmm.... one down already.... CPU: MIPS R8000 Processor Chip Revision: 2.2 FPU: MIPS R8010 Floating Point Chip Revision: 0.1 Data cache size: 16 Kbytes Instruction cache size: 16 Kbytes Secondary unified instruction/data cache size: 4 Mbytes Main memory size: 2048 Mbytes, 8-way interleaved odin 54% f77 -mips4 -O3 -o stream_df stream_d.f WARNING 85: definition of qlogb in /usr/lib64/mips4/libm.so preempts that definition in /usr/lib64/mips4/libc.so. odin 55% cc -mips4 -O3 -o stream_dc stream_d.c "stream_d.c", line 72: warning(1552): variable "c" was set but never used static real a[N],b[N],c[N]; ^ odin 56% stream_df -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 54.80809807777405 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 134.9150 0.2374 0.2372 0.2378 Scaling : 134.8364 0.2376 0.2373 0.2388 Summing : 134.8351 0.3562 0.3560 0.3565 SAXPYing : 134.7553 0.3564 0.3562 0.3567 Sum of a is : 1.1533007812800417E+18 Sum of b is : 2.3066015625365702E+17 Sum of c is : 3.0754687500612557E+17 odin 58% stream_dc Timing calibration ; time = 274158.000000 usec. Increase the size of the arrays if this is < 300000 and your clock precision is =< 1/100 second. --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 135.681 510998.938 117924.000 1117907.000 Scaling : 135.926 118220.000 117711.000 120189.000 Summing : 139.726 546249.375 171765.000 1171895.000 SAXPYing : 136.220 549960.500 176185.000 1178720.000