From henning@tle.ENET.dec.com Tue Jan 6 09:18:12 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA09464 for ; Tue, 6 Jan 1998 09:18:08 -0800 Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id JAA23302 for ; Tue, 6 Jan 1998 09:18:02 -0800 Received: from mail11.digital.com (mail11.digital.com [192.208.46.10]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id JAA09465 for ; Tue, 6 Jan 1998 09:18:00 -0800 env-from (henning@tle.ENET.dec.com) Received: from us1rmc.bb.dec.com (us1rmc.bb.dec.com [16.57.16.6]) by mail11.digital.com (8.7.5/UNX 1.5/1.0/WV) with SMTP id MAA13214 for ; Tue, 6 Jan 1998 12:04:15 -0500 (EST) Received: from tle.enet by us1rmc.bb.dec.com (5.65/rmc-17Jan97) id AA28180; Tue, 6 Jan 98 12:04:12 -0500 Message-Id: <9801061704.AA28180@us1rmc.bb.dec.com> Received: from tle.enet; by us1rmc.enet; Tue, 6 Jan 98 12:04:14 EST Date: Tue, 6 Jan 98 12:04:14 EST From: "John Henning, dtn 264-0378 06-Jan-1998 1159 -0500" To: mccalpin@sgi.com Cc: me@tle.ENET.dec.com Apparently-To: mccalpin@sgi.com Subject: (re-send) Stream submission: Alpha PC164SX Status: RO From: TLE::HENNING "John Henning, dtn 264-0378 18-Dec-1997 1343 -0500" 18-DEC-1997 13:47:09.76 To: US1RMC::"mccalpin@sgi.com" CC: ME Subj: Stream submission: Alpha PC164SX Hi John, Here's a STREAM run for the Alpha PC164SX Hope your son still likes trains (so you have an excuse to buy them for Christmas!) /john C:\mccalpin\play>f77 -tune:ev5 -optimize:5 -fast tmp2.f DIGITAL Visual Fortran Optimizing Compiler Version: V5.0 Copyright (c) 1997 Digital Equipment Corp. All rights reserved. tmp2.f Microsoft (R) 32-Bit Executable Linker Version 5.01.7076 Copyright (C) Microsoft Corp 1992-1997. All rights reserved. /subsystem:console /entry:mainCRTStartup /ignore:505 /debugtype:cv /debug:minimal /pdb:none C:\TEMP\obj21.tmp dfor.lib libc.lib dfconsol.lib kernel32.lib /out:tmp2.exe C:\mccalpin\play>tmp2 -------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLEPRECISION word -------------------------------------- Timing calibration ; time = 29.6875000000000 hundredths of a second Increase the size of the arrays if this is <30 and your clock precision is =<1/100 second --------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 273.0671 0.1242 0.1172 0.1250 Scaling : 256.0004 0.1266 0.1250 0.1328 Summing : 292.5719 0.1719 0.1641 0.1797 SAXPYing : 292.5719 0.1719 0.1641 0.1797 Sum of a is : 1.153302511199256E+018 Sum of b is : 2.306605022415484E+017 Sum of c is : 3.075473363245637E+017 So what is tmp2.f? It is a slight tweak to stream_d.f : ftp> cd [.sources_from_ftp_25aug95] 250 CWD command succesful. ftp> get stream_d.f 200 PORT command successful. 150 Opening data connection for stream_d.f (16.31.80.231,1296) 226 Transfer complete. 7509 bytes received in 0.09 seconds (88.34 Kbytes/sec) ftp> bye 221 Goodbye. C:\mccalpin\play>fc stream_d.f tmp2.f Comparing files stream_d.f and TMP2.F ***** stream_d.f INTEGER n,ntimes PARAMETER (n=2 000 000,ntimes=10) C .. ***** TMP2.F INTEGER n,ntimes PARAMETER (n = 2000003 ,ntimes=10) C .. ***** ***** stream_d.f C .. Local Scalars .. DOUBLE PRECISION t,t0 INTEGER j,k,nbpw ***** TMP2.F C .. Local Scalars .. DOUBLE PRECISION t INTEGER j,k,nbpw ***** ***** stream_d.f t = second(t0) t = second(t0) DO 10 j = 1,n ***** TMP2.F t = second(t0) DO 10 j = 1,n ***** ***** stream_d.f * This code works on Sun and Silicon Graphics machines. REAL function second(t0) double precision t0 real dummy(2) second = etime(dummy) end ***** TMP2.F * This code works on Sun and Silicon Graphics machines. REAL function second second = secnds(0.0) end ***** From bill@proto.math.ucdavis.edu Tue Jan 6 13:55:44 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA09988 for ; Tue, 6 Jan 1998 13:55:39 -0800 Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA16815 for ; Tue, 6 Jan 1998 13:55:38 -0800 Received: from proto.math.ucdavis.edu (proto.math.ucdavis.edu [128.120.22.112]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via ESMTP id NAA05157 for ; Tue, 6 Jan 1998 13:55:37 -0800 env-from (bill@proto.math.ucdavis.edu) Received: (from bill@localhost) by proto.math.ucdavis.edu (8.8.5/8.8.6) id NAA03861; Tue, 6 Jan 1998 13:55:29 -0800 From: Bill Broadley Message-Id: <199801062155.NAA03861@proto.math.ucdavis.edu> Subject: Re: TEST_FPU results To: mccalpin@cthulhu Date: Tue, 6 Jan 1998 13:55:29 -0800 (PST) In-Reply-To: <199801061507.HAA09211@frakir.engr.sgi.com> from "John McCalpin" at Jan 6, 98 07:07:23 am Content-Type: text Status: RO > > Hi Bill, > > Did you send me any STREAM results lately? I just trashed I don't remember maybe the below: Dell PII-300, Linux, gcc. Total memory required = 91.6 MB. (= 27 clock ticks) Function Rate (MB/s) RMS time Min time Max time Copy: 188.2353 0.3460 0.3400 0.3500 Scale: 172.9730 0.3750 0.3700 0.3800 Add: 213.3333 0.4590 0.4500 0.4600 Triad: 188.2353 0.5190 0.5100 0.5300 17.49user 0.82system 0:18.31elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (66major+23445minor)pagefaults 0swaps Pretty much finished the java version, unfortunately under ms-windows/win95 the clock is very inaccurate (I.e. 1/18th of a second) I need to add code to fix that (i.e. repeat more times to get more timer accuracy). Frustrating since i'm using the Java millisecond call. We received the 16 cpu origin system haven't gotten to play with it yet. I hope to run some multithreaded code and see how using > 1 cpu's works. > my system mail folder and lost about 4 weeks worth of mail > that I had not dealt with yet :-( Bummer. > (I then discovered that it is company policy not to back > up e-mail! Aaarggh!) Yeah becoming more common, you can't get sued/summoned for email if you don't have a backup policy backing it up. -- Bill Broadley Bill@math.ucdavis.edu UCD Math Sys-Admin Linux is great. http://math.ucdavis.edu/~bill PGP-ok From jual@isr027.isr.kfa-juelich.de Thu Feb 12 05:19:08 1998 Received: from giraffe.asd.sgi.com (giraffe.asd.sgi.com [192.26.72.158]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA16453 for ; Thu, 12 Feb 1998 05:18:59 -0800 Received: from sgi.sgi.com by giraffe.asd.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI) for id FAA12843; Thu, 12 Feb 1998 05:18:33 -0800 Received: from mail.virginia.edu (mail.Virginia.EDU [128.143.2.9]) by sgi.sgi.com (950413.SGI.8.6.12/970507) via SMTP id FAA05967 for ; Thu, 12 Feb 1998 05:18:31 -0800 env-from (jual@isr027.isr.kfa-juelich.de) Received: from ares.cs.virginia.edu by mail.virginia.edu id aa13927; 12 Feb 98 8:18 EST Received: from fz-juelich.de (zam149.zam.kfa-juelich.de [134.94.80.3]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with SMTP id IAA16332 for ; Thu, 12 Feb 1998 08:18:26 -0500 (EST) Received: from isr027 by fz-juelich.de (5.65v3.2/DEC-AXP) id AA05825; Thu, 12 Feb 1998 14:18:23 +0100 Received: by isr027.isr.kfa-juelich.de; id AA15657; Thu, 12 Feb 1998 14:18:22 +0100 Reply-To: J.Altes@fz-juelich.de From: "J.Altes" Message-Id: <980212141821.ZM32064@isr027> Date: Thu, 12 Feb 1998 14:18:21 +0100 X-Mailer: Z-Mail (4.0.1 13Jan97) To: j.altes@fz-juelich.de, mccalpin@cs.virginia.edu Subject: STREAM Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="PART-BOUNDARY=.1980212141821.ZM32064.isr027" Status: RO -- --PART-BOUNDARY=.1980212141821.ZM32064.isr027 Content-Type: text/plain; charset=us-ascii Dear Mr.McCalpin, enclosed please find some results of 'stream' on our DEC PWS 600au, 600 MHz, 1 GB Memory, OSF1 V4.0 564.32 alpha and DEC 3000 Model 800, 200 MHz, 256 MB Memory, OSF1 V3.2 214 alpha. Best regards. -- ********************************************************************** * Dr.J.Altes, Institute for Safety Research and Reactor Technology * * Forschungszentrum Juelich, 52425 Juelich, Germany * * e-mail: J.Altes@fz-juelich.de * * http://www.kfa-juelich.de/isr/isr_e.html * * Tel.No.: +49 2461 61 4651 Fax.: +49 2461 61 3133 * *********!*********!*********!****!****!*********!*********!*********! --PART-BOUNDARY=.1980212141821.ZM32064.isr027 Content-Type: text/plain ; name="stream.dec181" ; charset=us-ascii Content-Disposition: attachment ; filename="stream.dec181" X-Zm-Content-Name: stream.dec181 DEC PWS 600au stream_d.f with f90 -O5 stream_d.f -llapack ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 976 microseconds The tests below will each take a time on the order of 90768 microseconds (= 93 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 227.6868 0.1419 0.1405 0.1444 Scale: 223.0401 0.1444 0.1435 0.1454 Add: 243.4674 0.1981 0.1972 0.1991 Triad: 243.4668 0.1980 0.1972 0.1991 Sum of a is = 2.306601562591874E+018 Sum of b is = 4.613203124856438E+017 Sum of c is = 6.150937500141256E+017 --------------------------------------------------------------- stream_d.c with cc -O stream_d.c second.c -o stream_d -lm ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 1000000, Offset = 0 Total memory required = 22.9 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 975 microseconds. Each test below will take on the order of 42968 microseconds. (= 44 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 215.5810 0.0745 0.0742 0.0771 Scale: 215.5810 0.0747 0.0742 0.0752 Add: 223.4199 0.1076 0.1074 0.1084 Triad: 248.2418 0.0973 0.0967 0.0977 DEC 3000 Model 800 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 5000000, Offset = 0 Total memory required = 114.4 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 16666 microseconds. Each test below will take on the order of 766666 microseconds. (= 46 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 90.5660 0.9438 0.8833 1.1667 Scale: 94.1176 0.9110 0.8500 1.1667 Add: 97.2973 1.2776 1.2333 1.4000 Triad: 100.0000 1.2322 1.2000 1.3000 --PART-BOUNDARY=.1980212141821.ZM32064.isr027-- From mjr19@cus.cam.ac.uk Thu Mar 12 07:32:18 1998 Received: from giraffe.asd.sgi.com (giraffe.asd.sgi.com [192.26.72.158]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id HAA15913 for ; Thu, 12 Mar 1998 07:32:14 -0800 Received: from sgi.sgi.com by giraffe.asd.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI) for id HAA20205; Thu, 12 Mar 1998 07:32:08 -0800 Received: from mail.virginia.edu (mail.Virginia.EDU [128.143.2.9]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id HAA02864 for ; Thu, 12 Mar 1998 07:32:08 -0800 (PST) mail_from (mjr19@cus.cam.ac.uk) Received: from ares.cs.virginia.edu by mail.virginia.edu id aa27041; 12 Mar 98 10:32 EST Received: from taurus.cus.cam.ac.uk (cusexim@taurus.cus.cam.ac.uk [131.111.8.48]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with ESMTP id KAA20103 for ; Thu, 12 Mar 1998 10:31:59 -0500 (EST) Received: from mjr19 (helo=localhost) by taurus.cus.cam.ac.uk with local-smtp (Exim 1.890 #3) for mccalpin@cs.virginia.edu id 0yD9xO-0004Kf-00; Thu, 12 Mar 1998 15:31:42 +0000 Date: Thu, 12 Mar 1998 15:31:42 +0000 (GMT) From: "M.J. Rutter" To: mccalpin@cs.virginia.edu Subject: Stream for SR2201 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO John, Many thanks for releasing "stream" to the world - it has done much to make vendors aware of a very important issue in large-scale scientific computing. I enclose some results from a slightly unusual system, a Hitachi SR2201 (single node). It is a RISC based MPP with some pseudovectorisation features which enhance the memory bandwidth and hide latency. If you want any more details about the system, do ask, or see http://www.hpcf.cam.ac.uk/tech.html. Michael Rutter ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 2 microseconds The tests below will each take a time on the order of 40948 microseconds (= 20474 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 775.6824 0.0413 0.0413 0.0413 Scale: 736.0721 0.0435 0.0435 0.0436 Add: 829.4023 0.0580 0.0579 0.0584 Triad: 826.9161 0.0581 0.0580 0.0581 Sum of a is = 0.230660156249616742E+019 Sum of b is = 461320312502628096. Sum of c is = 615093750008502400. [Compiled with: xf90 -W0,'OPT(O(SS))' stream_d.f second_cpu.f second_cpu.f was completely rewritten, stream_d.f was untouched, revision 4.1, June 4, 1996] From mapleson@thegate.gamers.org Thu Mar 19 05:03:51 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id FAA05010 for ; Thu, 19 Mar 1998 05:03:49 -0800 Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id FAA1282916 for ; Thu, 19 Mar 1998 05:03:48 -0800 (PST) Received: from thegate.gamers.org (thegate.gamers.org [128.205.37.150]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id FAA27211 for ; Thu, 19 Mar 1998 05:03:46 -0800 (PST) mail_from (mapleson@thegate.gamers.org) Received: (from mapleson@localhost) by thegate.gamers.org (8.8.5/8.8.7) id IAA18411 for mccalpin@engr.sgi.com; Thu, 19 Mar 1998 08:03:44 -0500 Date: Thu, 19 Mar 1998 08:03:44 -0500 From: Ian Mapleson Message-Id: <199803191303.IAA18411@thegate.gamers.org> To: mccalpin@cthulhu Subject: STREAM question (again :) Status: RO Hi there! :) Yet another question about STREAM... I asked you in the far gone distance past: "Is STREAM CPU-dependent? For example, I'm familiar with the existing STREAM result for dual-CPU Octane (515); would/will this number go up significantly if one were using two 250MHz R10000s or 300MHz R12000s?" to which you replied: STREAM performance dependends on three factors. The most obvious is the peak carrying capacity of the system memory interface. The ability of a system to use that bandwidth depends on the relation between the main memory latency and the ability of the cpu to tolerate that latency. Latency tolerance is usually quantified by: (# outstanding misses) * (cache line size) latency tolerance = ------------------------------------------ burst bandwidth on the interface" >From this I infered that faster CPUs of the same chip design wouldn't affect the results that much. However, I observed these numbers on the STREAM results table: COPY SCALE ADD TRIAD SGI_Octane_195 1 309.0 309.2 346.5 351.8 SGI_Octane_175 1 277.4 277.9 311.2 316.6 SGI_Indy_R4400_175 1 72.7 68.1 71.6 70.6 SGI_Indy_R4400_100 1 45.7 41.0 44.4 50.0 For these cases, if my inference was accurate, what has caused the jump in the numbers? I ask because a customer wants me to test a 180MHz R5000SC O2, ie. will I see higher numbers with R5K/200SC. Btw, I just ran the default STREAM on my office Indy (R4400SC/200MHz, IRIX 6.2, MIPS Pro 7.1) and obtained: Function Rate (MB/s) RMS time Min time Max time Copy: 61.5657 0.2614 0.2599 0.2684 Scale: 60.1434 0.2665 0.2660 0.2680 Add: 65.4992 0.3671 0.3664 0.3679 Triad: 74.3992 0.3238 0.3226 0.3258 Is that pretty typical? NB: running the program several times, the numbers never varied by more than 0.1MB/sec; then I compiled with: cc -O -n32 stream_d.c second_wall.c -o stream_d -lm the results changed somewhat to: Function Rate (MB/s) RMS time Min time Max time Copy: 69.1109 0.2495 0.2315 0.3199 Scale: 67.9333 0.2508 0.2355 0.3022 Add: 72.3061 0.3343 0.3319 0.3466 Triad: 70.5965 0.3430 0.3400 0.3589 which is rather odd since the compilation with -n32 gave this warning: ld32: WARNING 84: /usr/lib32/mips3/libm.so is not used for resolving any symbol. Naturally, I saw the same effect with -mips3. I obtained the best results, after some fun experimentation, with: cc -O3 -n32 -IPA -LNO:opt=1 -TARG:platform=ip22_4k stream_d.c second_wall.c -o stream_d -lm That gave: Function Rate (MB/s) RMS time Min time Max time Copy: 75.0595 0.2166 0.2132 0.2325 Scale: 71.4723 0.2244 0.2239 0.2252 Add: 74.7831 0.3218 0.3209 0.3257 Triad: 76.0398 0.3164 0.3156 0.3171 Want to add them to the results table? (full output given below). Note that I didn't modify the source files before execution - are they ok as is for a system like mine with 1MB L2? Btw, the numbers jump by 1% if I use CC instead of cc! :D It's a pity nobody has ever submitted Indy R5K results. I'd like to see those. I'm hoping to run the test on an O2 soon (R5KSC/180). Just need a SCSI cable now to port over the stuff by DAT (awaiting delivery of a cable from SGI). I think I mentioned before that SGI US is loaning me an O2 for a few months, so I'll be able to test an O2 with R5K/200 too. They *may* swap it for an R10K/195 model half way through the loan if one turns up, in which case I'll test that too. Now to test that Indigo2 I bought. I think I need to change the source file somewhere because the I2 has 2MB L2, yes? If so, what should I change? Cheers! :) Ian. SGI Network Admin, University of Central Lancashire, Preston, England, PR1 2HE. mapleson@gamers.org | Tel: (+44 -0) 1772 893297, Fax: (+44 -0) 1772 892913 "There is no magic, only stuff." - Nakor, "The King's Buccaneer" (R.E. Feist) Doom Help Service (DHS): http://doomgate.gamers.org/dhs/ SGI/N64/Future Technology: http://www.geocities.com/ResearchTriangle/2321/ BSc Dissertation: http://doomgate.gamers.org/dhs/diss/ CyberSurvival: http://www.007eleven.com/ ******************************************************************************** MILAMBER 31# cc -O3 -n32 -IPA -LNO:opt=1 -TARG:platform=ip22_4k stream_d.c second_wall.c -o stream_d -lm stream_d.c: second_wall.c: ld32: WARNING 84: /usr/lib32/mips3/libm.so is not used for resolving any symbol. MILAMBER 30# ./stream_d ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 1000000, Offset = 0 Total memory required = 22.9 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 25 microseconds. Each test below will take on the order of 141404 microseconds. (= 5656 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 75.0595 0.2166 0.2132 0.2325 Scale: 71.4723 0.2244 0.2239 0.2252 Add: 74.7831 0.3218 0.3209 0.3257 Triad: 76.0398 0.3164 0.3156 0.3171 From tage@tagesknb.cs.uit.no Tue Mar 24 16:14:11 1998 Received: from giraffe.asd.sgi.com (giraffe.asd.sgi.com [192.26.72.158]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id QAA15901 for ; Tue, 24 Mar 1998 16:14:10 -0800 Received: from sgi.sgi.com by giraffe.asd.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI) for id QAA27443; Tue, 24 Mar 1998 16:14:05 -0800 Received: from mail.virginia.edu (mail.Virginia.EDU [128.143.2.9]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id QAA02589 for ; Tue, 24 Mar 1998 16:13:59 -0800 (PST) mail_from (tage@tagesknb.cs.uit.no) Received: from ares.cs.virginia.edu by mail.virginia.edu id aa08627; 24 Mar 98 19:13 EST Received: from tagesknb.cs.uit.no (tage-ni.cs.UiT.No [129.242.17.37]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with ESMTP id TAA08388 for ; Tue, 24 Mar 1998 19:13:53 -0500 (EST) Received: from tagesknb.cs.uit.no (localhost [[UNIX: localhost]]) by tagesknb.cs.uit.no (8.8.8/8.8.8/ACM) with ESMTP id QAA00340 for ; Tue, 24 Mar 1998 16:34:14 +0100 (CET) Message-Id: <199803241534.QAA00340@tagesknb.cs.uit.no> To: mccalpin@cs.virginia.edu Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT Reply-To: tage@acm.org Subject: Result of run Date: Tue, 24 Mar 1998 16:34:13 +0100 From: Tage Stabell-Kulo Status: RO Sir, Please find below the results of running your streams package on: HW: HP Kayak XU (dual Pentium Pro/300MHz). Timecounter "i8254" frequency 1193182 Hz cost 3070 ns CPU: Pentium Pro (686-class CPU) Origin = "GenuineIntel" Id = 0x634 Stepping=4 Features=0x80fbff OS: FreeBSD 3.0-980304-SNAP Configured with SMP support Flags: gcc -o stream_d stream_d.c second_cpu.c -malign-double -O6 -lm gcc version 2.7.2.1 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 3000000, Offset = 0 Total memory required = 68.7 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 7812 microseconds. Each test below will take on the order of 195312 microseconds. (= 25 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 175.5429 0.2825 0.2734 0.3281 Scale: 180.7059 0.2802 0.2656 0.3281 Add: 214.3256 0.3422 0.3359 0.3438 Triad: 188.0816 0.3926 0.3828 0.4453 //// Tage Stabell-Kuloe | e-mail: tage at ACM.org (at=@)//// /// Department of Computer Science/IMR | Phone : +47-776-44032 /// // 9037 University of Tromsoe, Norway | Fax : +47-776-44580 // / "'oe' is '\o' in TeX" | URL:http://www.cs.uit.no/~tage/ From fjohn@us.ibm.com Fri Apr 17 13:44:35 1998 Received: from giraffe.asd.sgi.com (giraffe.asd.sgi.com [192.26.72.158]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA06752 for ; Fri, 17 Apr 1998 13:44:31 -0700 Received: from sgi.sgi.com by giraffe.asd.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI) for id NAA01000; Fri, 17 Apr 1998 13:44:22 -0700 Received: from mail.virginia.edu (mail.Virginia.EDU [128.143.2.9]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id NAA08695 for ; Fri, 17 Apr 1998 13:44:20 -0700 (PDT) mail_from (fjohn@us.ibm.com) Received: from ares.cs.virginia.edu by mail.virginia.edu id aa15830; 17 Apr 98 16:44 EDT Received: from smtp3.ny.us.ibm.com (smtp3.ny.us.ibm.com [198.133.22.42]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with ESMTP id QAA05148 for ; Fri, 17 Apr 1998 16:41:31 -0400 (EDT) Received: from relay1.server.ibm.com (relay1.server.ibm.com [9.14.2.98]) by smtp3.ny.us.ibm.com (8.8.7/8.8.7) with ESMTP id QAA07526 for ; Fri, 17 Apr 1998 16:34:27 -0400 Received: from US.IBM.COM (d01lms04.pok.ibm.com [9.117.30.25]) by relay1.server.ibm.com (8.8.7/8.8.7) with SMTP id QAA94122 for ; Fri, 17 Apr 1998 16:39:13 -0400 Received: by US.IBM.COM (Soft-Switch LMS 2.0) with snapi via D01AU004 id 5010400021731598; Fri, 17 Apr 1998 16:45:32 -0400 From: Frank Johnston To: mccalpin@cs.virginia.edu MMDF-Warning: Parse error in original version of preceding line at mail.virginia.edu Cc: "Frank O'Connell" Message-ID: <5010400021731598000002L082*@MHS> Date: Fri, 17 Apr 1998 16:45:32 -0400 MIME-Version: 1.0 Content-Type: text/plain Status: RO John, I would like to submit STREAM benchmark results for an IBM RS/6000 model 397. The entry in the table would look something like this: ncpus COPY SCALE ADD TRIAD IBM_RS6000-397 1 778.8 777.5 883.1 882.4 The table entry labled IBM-SP_P2SC-thin 1 690.6 684.6 787.2 786.8 should probably be changed to IBM-SP_P2SC_120MHz-thin 1 690.6 684.6 787.2 786.8 since there is a 160 MHz P2SC thin SP node which is equivalent to the RS/6000 397. Thanks, Frank Johnston XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX P.S. Here is the output from Stream on the IBM RS/6000 model 397. The compiler used was xlf 5.1.0.0 with options -O3 -qarch=pwr2 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 28828 microseconds (= 28828 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 778.7826 .0415 .0411 .0419 Scale: 777.5306 .0414 .0412 .0420 Add: 883.0675 .0554 .0544 .0626 Triad: 882.3979 .0545 .0544 .0547 Sum of a is = 0.230660156259187354E+19 Sum of b is = 0.461320312485643840E+18 Sum of c is = 0.615093750014125568E+18 From chris@1reality.org Thu Apr 9 21:23:26 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id VAA00949 for ; Thu, 9 Apr 1998 21:23:26 -0700 Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id VAA9968549 for ; Thu, 9 Apr 1998 21:23:21 -0700 (PDT) Received: from copland.udel.edu (copland.udel.edu [128.175.13.92]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id VAA17660 for ; Thu, 9 Apr 1998 21:23:20 -0700 (PDT) mail_from (chris@1reality.org) Received: from cokemachine.evcl.com (cm20813844215.cableco-op.com [208.138.44.215]) by copland.udel.edu (8.8.8/8.8.8) with ESMTP id AAA02748 for ; Fri, 10 Apr 1998 00:23:14 -0400 (EDT) Received: from [192.168.1.17] (chris2.evcl.com [192.168.1.17]) by cokemachine.evcl.com (8.8.7/8.8.7) with ESMTP id UAA04244 for ; Thu, 9 Apr 1998 20:43:29 -0800 X-Sender: reality@shell7.ba.best.com Message-Id: Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Date: Thu, 9 Apr 1998 21:24:15 -0800 To: mccalpin@UDel.Edu From: Chris Schaefer Status: RO Macintosh G3 PPC 750 Processor at 266 Mhz MacOS 8.1 Codewarrior Compiler generating 604 optimized code with all optimizations set for maximum speed. Memory is SDRAM ( i'm not sure of the speed ) ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 16666 microseconds. Each test below will take on the order of 150000 microseconds. (= 9 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING: The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 137.1429 0.2484 0.2333 0.2500 Scaling : 128.0000 0.2500 0.2500 0.2500 Summing : 137.1429 0.3584 0.3500 0.3667 SAXPYing : 137.1429 0.3584 0.3500 0.3667 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Chris Schaefer Email: chris@1reality.org Professional Bit Twiddler and student of reality. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= From larry.albertelli@lmco.com Fri Apr 24 13:50:41 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA09003 for ; Fri, 24 Apr 1998 13:50:40 -0700 Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id NAA15736738 for ; Fri, 24 Apr 1998 13:50:39 -0700 (PDT) Received: from inetgw.fs.lmco.com (inetgw.fs.lmco.com [204.177.125.1]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via ESMTP id NAA24062 for ; Fri, 24 Apr 1998 13:50:37 -0700 (PDT) mail_from (larry.albertelli@lmco.com) Received: (from mail@localhost) by inetgw.fs.lmco.com (AIX4.2/UCB 8.7/8.7) id QAA57638 for ; Fri, 24 Apr 1998 16:50:35 -0400 (EDT) Received: from lalbert.owg.fs.lmco.com(158.187.68.214) by inetgw.fs.lmco.com via smap (V2.0) id xma034994; Fri, 24 Apr 98 16:49:52 -0400 Message-ID: <3540FAEE.64692297@lmco.com> Date: Fri, 24 Apr 1998 16:49:50 -0400 From: Larry Albertelli X-Mailer: Mozilla 4.03 [en] (Win95; U) MIME-Version: 1.0 To: John McCalpin Subject: Re: STREAM results for 440BX References: <353F95CC.868CF749@lmco.com> <9804241344.ZM8989@frakir.engr.sgi.com> Content-Type: multipart/mixed; boundary="------------9CBB6CBFF927207DADFF247A" Status: RO This is a multi-part message in MIME format. --------------9CBB6CBFF927207DADFF247A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit John McCalpin wrote: > On Apr 23, 3:26pm, Larry Albertelli wrote: > > I meant to send this earlier. These are results using your DOS > > executable of STREAM. The new 440BX chipset with 100MHz bandwidth now > > has the same performance as our SGI O200's with one cpu. All data is in > > txt files with summary in xls file. Thanks for the benchmark. I hope > > this is useful. > >-- End of excerpt from Larry Albertelli > > Oops -- no attachments showed up with this message! > > -- > -- > John D. McCalpin, Ph.D. Server System Architect > Server Platform Engineering http://reality.sgi.com/mccalpin/ > Silicon Graphics, Inc. mccalpin@sgi.com 650-933-7407 OK here it is again. Let me know if you don't get it. It's called STREAM In-house Benchmarks.zip. --------------9CBB6CBFF927207DADFF247A Content-Type: application/x-zip-compressed; name="STREAM In-house Benchmarks.zip" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="STREAM In-house Benchmarks.zip" UEsDBBQAAAAIALJadCRcIefY0RkAAACMAAAeAAAAU1RSRUFNIEluLWhvdXNlIEJlbmNobWFy a3MueGxz7FwLdFXVmd7nnPvOzb03NwkCeZ0kQB4ESHKTIKLNTQhalEBI6DLyEEMSCQKJJkEU l5QMxRmXVejS6YxTXZbStVpHyxptx9GOSALjY1zVWu0IPmpVOjP2KbZSH1PI/P+/zz7n7HNP CEhd1pKddfc959v7f+z/7L3P/v+9b174ccabex6a+hZzpC8wjZ0cDTKfDVPg0yRuYgzKR0fx UnwvgM/oRPpcpWAgwJjXyxKZz/qxE4wwFfID8Pk2PHuFxaKMDcPd2/BZxepYdHFf54ae7u4u vbmjf3B9r35aqRG4/sJzkTKNelGMZcC9xjzMCzL8LMCCLMTSWJilswiLsl9B6ejoPZBH2BTS J055JuUXwd1V+Dfthvj5wEWBv3Y1SWW3U15EeQRyhb1K11Usiz2DvfPLuxXqvf/rbehf37GR 4wrgj6upuDpGfW2M+u78k4rJX7Xjk13wfM9UsMl5RdOKiiuKiyvXlM1fVSpuVpXlewrAXrm2 0pWt3V2r5So6GDRPALMrZR5wj3Wmg8mL5DoOTkbFWlbOqtcgWq4LmcbtqlJDpACKZhVRaXIN EFawMla2ptROJZM46l/EKtgFdkFcK1kWKWUTV19vMahhM1mVKVBQp5Cm0Pmgn2tQ/CZOaJCO j+psWKdLhpjCsffHw9TTxNhfEaYw2VZeeAZQ70MZK3PBKlywchcsmIJhUlhRTMZwpBdlyVgE 9fPIWJA1KLLOjOYPi/ZO1cNi27VR/M7Y7qPv+HYPfWdu99M3287oO2u7d3QnzFf30Usy0Nqx RW/qGOzYCaLzihHyN1zf3d+xrnsnO48trkQ9Y82N+rXd/Xpbd6e+oAdm0J0wWz3ZgEU5S/r6 N3VsXL8V59aUWulsURuy9LXB5DuYQODOyy2gBoHb2i2gFoEdV1hAHQI3rbCAuQgMrLSA8xHY sMoC5mFLOldTSwioqkRk5ZU2pAqRtjU2pBqRy66yIQlEFnTYkBpE5q+1IbWIJDptSF0Irvnr KUavpzQ1i/E0l13thxXIFMaeKmKgND7+MORd8DDwOoOGXgxmuT/d/96LzWtb6tcQXk74TMr/ hpDtUEek6fhoWUD5KryYsFMMQY0RTxZM9+qMhmyo6G+BZ6nPaNlBDL5C+R54frwP4Z+H3fXG WoPdm/Xie3T0S1A3wG4GJRWYwY8GsgEHYWqIGkOLKRZQee+G9yFMMGtpuYWYImEf7EJMdamn mdjKIYF5XDCvibWbmM/EgiNCht/EsIErhxALSLQcC0oy2glLk+oFRxALm9gHB4TcdBcsIsld S1jUpV6M7CawpiRiGSaGtNmExU0su0XQZppYVouQkSXRchtMsmRAb1Gp3nku9Sab2FpT5ylS PY5NNbFbzOeWI9XjWK6E3UJYnku9fBeswIF51Bl0GYXal3QMdm/puFFvqZ01t1ZPVMM8c3FL s97a0IyrtG3QH5HsPJbc8VYS6VSDrnVzr351f98m/fL1vV19Wwb0ebV609I243Y/m0brMOzV x+oxZ0nMA5THKM9KekGTKNTzQT0/M2ZGjxqm0ZYNeXPjnAF+rzruNce9x3Hvddz7bPcR6vBR wBYsbbkiUY3Kekmi0vcYqKaUPJacBPkP8dpD+YnHkrgwLaTlaTb7/jPXP3Hf1pYkg+6gnEhj 073D3mHF26hgiyI0dgT7uhpk7+PsL7Gx99jYQ46Mc+CD7Fsfzd/56+takgEUNwqqIfy4Jxty cL98zJPGZsESfPj/RoVEvyFRb1vQsHghSvRziXsfT/IcJHbsx+t2ym/fb0j0k8S3t2btu2RP K0j0WRKRcUAw1vWGpiZkHOCM//FAkufA+EvDeG3mnHGAGH/3qcvm3rpnOTD2y4yDgvHy1kUN xDjIGd9FNorvR8YPkK5VTxDyhME4SIzvR8ZDqHHAzngOTTRRVsLaLte7Nnds1FsWLdITlZXN X9yqNwxsHtBbqhfPm2v5IUgSNkis/qtv7u2Ct+2S5XrN7Ep46fZfv76zW2/p6NygJ4Cknqap KEzUjX2DPfq1/X2d3QMDff0Dendvx9qN8LqeNUvv6+3W1/cODHb0AqkjYXsjpzk+IqnjI+oY H1HH+Ig6xkfUMT6ijvERdYyPWMr4iPGH8/BPk2PlfHzEiNNFocN7r735miQur3yjMD5iwzHb +MhIGR8ZxH7k4heA2ch8W24i/NlnEPtvLXzkDyd+1AXPPsaffYYxPjJgynYdH/GU8RHnErmU mzEf8lnXmHOJcZJY/NpH2675DUrMsCQi48yU8ZFJjBs+fjE5Vs4ZZxLjQ8dLV+26tAcYx2XG WSnjI4s/gk0/IYtgzq+HRuj62Z8YjLOI8e9P/PSGX/ZeDYwz7YyT9AKLsvNP1XMHt/SZPXfA 2XU9ahxeB6hKiC0S3buKo6qMVnvUdECxL3rZ8r7Bjo28mseV2OtK7HMS+12JA67EQSdxyJU4 zZU47CROdyWOuBJH7cRhWA3IA3ayY8BOdgzYyY4BO9kxYCdLAxbv/Y77gOM+6LgPOe7THPdh x3264z7iuI/a7iO00rFPIHFAqPde+HQSemz2U0kVSgsAxXbDEDh4nG2C6QL67miQNU0ZVuCj apwQTaF4kVD5xZPUyZEQDbDvkre2Zj2xEQjzmMYJPfDxGoR+nCirnk5GSWLAIERL0Bv2I05o SPTDJ2AQhpDwQkGYZhCGJVXzWJAThuCTZhCmA+HQx08ZhBGDEG0DEyLMilximBOmwyeiobmm OibEOCBkrrX0Ti0cJnPlAIrmAtWXPfo8TkZgPhzaCD/uyYRiMIOmeYJsMRs+CZ9RjfNCCw7d TbwaxPt5KlmQePm6iZeP8/KZvHw+F15o1KH7hqGJyCtg8EKjHsS5LMR5BTmvoMkrGHThhXZG vTivNINX2KFXmPMKm7zCYRdeZPqbiRdYLWLwioqHHeS8opxX1OQVjUq88GnkOF4WcUDoaTyP a54hn3gaOfQ06MH24dOYaj0NTkRmnyGbPYfMjq3L38+JTLNzIuq0DaIdAYMI7ZvAdJQTmfbl RGTIGUg0tEoYMscy5POcyDQkJ0qXJEUMoqhDkmkxNE2u43UXBwRNMzR4EFv55EHDNLlkGlxU vvXiOmCTI5sml5vmOBFVHDJMk0um2YXdaF0PEUmmySXTgP+IrQTSgEGEpsH1xkVFnEgyTS6Z hhMpPztomCaXTEP9Nc6JJNPk8s60+6AhKWIQoWm+jcNY40SSafIcL+w4IHyB14695tJDhmny yDTIZW87ssmVTZPHTfMxmeZeYZo8Ms1NaM+HOZFkmjzea36GCivfOmSYJs/qNXs5kWSaPD7N caIdhwzT5JFpdpNtOJFkmjxumuNkGmhTxCCKise9lRPZTBMGb1F+++U73n75jrdfvuPtly+9 /dD7dKyII+TiyivWAr7A++7LaP1BzBv2Ys4R5e9eNlasBcSsN75j+Nlf8uk5iivWguECY8V6 mCXZLbBkbVLL4a1Xzuazq9ga1s+uoL8E/C1wYE2sSjnG5ih3KDn010leuM5SIzSFEsYjB0Uu 9YpNzIraTHfBZpiYFckpMTErklNqYsyM5JRJtBwrl2TwSM5MqR6P5FSYmBWNme2CzZHk8ggI RkHTHPWqJIxHcqpNjJmRnISJWZGcGhOzIjm1Ei23QZ0lgyI52I105vRMdD6/NaBX0LCJ8mut 65E+4Sfo1I3+pb/na7/9E06cBXyEIoyeiQ6P0NUzKWROz6SQS/ztj5F/gjyTZZQ3iGsusZAk Pg1ucMMwStQtici4SDA2PZMiPiLuQT9kJPYS5q/i9dDzmCuHhGdSRIz/54Wi1z6ahKO5UGZc zOSJbhog5PJ4ieVlmA/VY65UUq6/ZDAuJsbBB3NXdN2GjIvsjIupQ0eh2lJwnFuEY6IvNPwS 02+ecZp+8wznLBGGoSBPRCWOiajEMRGVOCaiEscyvITJfnMpc85CpafpN5cy2W8u5n5z6XCp zW8uS+mdZal+88WWL4s5t3wZsb+5vvbKHz6EfaWUW77M6J1lMIBde2d5Su8st/nNhsds86Et T72cJP4AA2XPosQySyIynpnSO2caj7UHGf/ni0lxzf1mRDjjmcT44Xvf/c5TmdiJymXGFSm9 s8LQmPzmLsq/bvnNiHDGFcS4EIfwavSbZ9oZz6DpDF89S9av6xkc6Ono6taXLKipqWxs1+tq mhv15u5Nff03etRCmuSisNbEfpyopcgTBqGqeBCqcfPAfnYhzHrj9WLMJ1OeS7lOeVEySL27 kjw4qXdXOXp3laN3Vzl6d5Wjd1c5eneV1Lvx3u+4Dzjug457u5OZQxN4lGUao8MIJ1TXJGYn avfDbFxN6g49eDgpcuV3/2VcozuG+cghzDkyNO9IMkhPrppWMm/h+iPSC0+ugj85hLF3VwPz UIh5jd79gdG7c+jtIRSqqzEVmju7qhIVSpBCDc8dSYocRfI8auTKD6zroUKhUIIUIl8x2QcK gSZCIZRbY8jlo0rIrU3MnjcX5daQXOQsJFrX0TGkc7k1JJeW/Bei3IQst1bIpUFnyE1UVs6e V4Nya0nuyLbXUdYRyt8VOZi+B3Pjepu45nJrSe51MH0VX90PcmtkuXVCLo1J0851s6vowdfZ HjwDe0F7M15Bmxe9glIeP+KSc7l1JBeDAPvuwPaCHicNuYdhwPYHGdutiFVZiWOVVg4vnDb2 TfY1dgP9fUENsUfolYAJl7QaMsI1GO9BqbuPfHsOG1cNq6yfz/mMdx6b2RKNsWXNbXri/Opa fcucxQsWLddn6IsXLG1jVvJonrvYo2rBFr5L/qtQX6ALvv2sXeUnZzR2OgmVfWYKY8Nxfg4M E36rLnVnXXOvZ8n8e1bP2RCk5jAjx9pY/x0QeVAwAdNxDfAMmfVRHPdjl0+kTyONji7YPDDY t0m/Fnte1fgELml3I/9gsnOrPjXZGGksbolTk42R7NxE2kOL5i6mGP21XW03ure1l68w+14+ 3PmjvIj28vEa9wMDxmThYz0KenZJxdrL9zk8QFXCrL18n8Pb87hgXhOT9/J9Dg8wYGLM9ABD Eu1Kc9/eksE9wLBUj3uA6SYm79vbMY+K8SMFbJLltu+se1TsByqUl7Mmx97drFm67BeYCbfj GNjH3I5rsTY1FlqbGkhtRukdCfdF8PiFuS/izmL5lj6Thcu+CLpyXoYrP9sacVbb+t51G/lC sGcrrlD5DrpwEjSVn0qcYm4qY5gFZ8Up5B7wXGO8ngdQEarAvXkVBE5hZTcdTiJnzca5rgbL NeKsPPxYUqNy5Cq8AI5ogIj9NI6gDOHHch4oY+SaIyTDw+weAZZ7uIzc/UTvIRli3c8RlCF2 0DiCMoTnynmgjOvWcRlie16ndYpGCMoQm8kcQTnCDUAeXpKD7gIvRxnCieXlKOPuA68n7Xv0 fE2iEYIyxL4yR1CG8Ag4gjLE3hpHUI7waVGOj+Tc+B1sSy4LjNPfA5+ovwfs/f3Mt5+xvwfs /f0T7QMWssDp9PeQ1N9xRwTPlExmJ35++LLcd16CwQtkJ4KsSR1WmtTtqg9Xdti1JzOejtUH kARXlAhj5D6EZ3m9WlaQLf5wFD4HBJXmoFIFFS/2QHHy+elffej696hYk4u9UFwbqW1bWfJ7 KIaOKoqxHWnS6EKCNGrH++/ctO+/N71C/IgAYVQyDbf4FdpesCuZ5mhamtU0Xoxt+NN9/9qe lc2LVbkY21C8Vrt02gW8WJOLsQ27ls73Bcs/JJW89jaEpdGLBGFqg5J535dXPPg68TPbwItl ZcOysmFStnX2tOnKVF6sysWo7F1Hfe+Gjr1LxZpcjMpWVVyz7eXghyRbUjZdmgaQIJ2UzX/j ZMPghteIn6RsukPZdFnZdFJ2wyN/fG7dNF6sysWo7N/W7O7QVV6sycWo7G9OTv1KcPBDki0p G5HmEySI8N7hXZC4YuurxE9SNuJQNiIrGyFl/23aC6t7wrxYlYtR2X8Yurw/uus9KtbkYlT2 qnl/H++8848k23vSKD7M8tmt8PY/xmC1z1bA4gb/euDuGfYf7Hvwxx0hkcgRCtCcrDA8beP1 PgCrIp2dGOd3BJa7pJO7hCcaMIJh91jKgN/VlH5XP+G9THgvn16a8F4CY3gvGObS2CQaTXhk xq8ZMUUlhrXVmIhyMFYZekDdvfdlJQHQN2EGSmBnjwVomYnLK8SXxaYaoyjOLsD+/O+RbzBv DDUAND2GB/98fC2wLJaHGy8Mw5bzseo/Z1hVQR3vslgNU4lXBSsr5XUOQJ1AzByGHpKxA2Tk GaA3lu0DncRQWkitqAHNxtU0LSrEa5KmdTUpmr6RYVXVxtD0D26avuSmKb89A01Xmpp6TE35 Oz1F03jcquoZQ9OSuIumxVEXTfmEdwaaPmBq6rU0pRd6iqYXx62q3jE0Xemm6YCbphpNw2eg 6W3mI/VZmtLbPEXTrXGrqm8MTXe5aYopRVMPOVWWpk0xDy0nLo7hkGyI4eFF3Ih5E6jehUXR DvhgC5bGsmicOMvyiMKZkCIf2gw8FeaSillhDCcCfyzEtm/fbqChWJgdPXqUrpEM9UP+qSyQ f0GsPAUfKxUoXBvtFNoopA1POEoQU10wzYGh1qOGvdHYqPV0lKZyYe/DZwsYfSd8T4Gnfatq t6dIuGC5Az4zQIIf+kvAXCmkx3Dllt3Y0du1ZX0X+E+lzY1z2ro7+3q7ykpiPoMHSs2KiV+o yAnlnRdDJ/Tr8CkytcdF5AUxPP6Je01d9BPCSSwbuLwET/ZHcHcj6HA7CPBDjxRao3uM5c9B eRI+lzG5veWwVjssDALpIebWXpHsLcU2VMewf3Jp49vZ/pFsDsOxFb4f9TN2HL6HXHXIhsp4 eBBt7nPYHH8qWti2vHVhQ7Ox26ZbT6Cxu7ezZ1NH/wa0v2LqXkMvr5Dx8bIsjQehMI3llPMg 1anL1XHKtXHKPacorzb0P7OgQDXj08qZU6mfiEr7RFSeM6aqN3rHWQT36mltc9Ys1LNnoZ09 C8/ZsUga3snZBDqT9CI9ex7qn4GH9mfg4TlLHoXEgZ06EFZIgdLTqKSeTiXtdCp5Tl1JM0Kt zAw9i1ArM3+CJEKtjG+v72k1EJQvh2NRmBw8xSROtIgAK5PCz/xNaQ8W+4izCOuKACuTgrD+ FM5+4mwPY/uJsz3o7CfO4vcWHEHO9mAu338VAXaOIGcRDucIchbBa44gZxFq5ghyFkFhRILE WRxR4AhyLjPOCHAEORcYu/ccQc5iX50jyPmLxo43RmosRHhkE4GXicDLX06aCLx8VoGXb6SN F3gJCod2n+nQhnCrxdWhPeTm0HamCdc7cBaBl1+bmo4VeDE1fdvUNA13Tlw1/cBN0++7acrr nIGm88PjBV5MTSdliqph3DZx1XRmpoumWthF0zMOvNxmajpW4MXU9FJT03TcM3HVdI2bpm1u mp5x4KVt3MCLqek2U1PccXHX9E43TTGlaPrXGXj5p7sxfVTPvx8zvu+vF+UZn2Hg5QibCLx8 loGX7ymnCrzgoWy3wAseaK4YN/ASsv5bzUQMZiIGI6WJGIydxUQMJpXHOR+DEcehrBiMOFhk xWDEMR0rBiPOwlgxGHHgxIrB8HSs3orByIiagmgpiMeB+JmVBIKcxbklKwYjDgdZMRhxAseK wYhjLlYMRhzSsmIw4siTFYMR54qsGIw4vGPFYMQJGSsGI853WTEYcVrKisGII0lWDEac+7Fi MOJwDcVgfAJJjcGI3wqETXutuPxz9zsBZH02AR8M9mDQR3Q+/E5d/8AqZ8W+h+SAj6CZCPh8 XtJEwEc95e8E+O8CKEFPdjlsZ5ZROtVM0tr+F/a/Dpk5rD/Vdl96xbnZ7sYV52a7L1h5bra7 etW52e7y1edmu4uuPDfbPXXNudnu+FXnZrtDHedmu9W152a7/7+963ltIojC3wYC1XrYkyBK G1Gw2Cr+6I9Uq8RqRQ/aooJ4U9LUltasJBHrzYsXT4qCl4IHQQSLCJ4D/gMee/DmQW+C0Iu0 h6zvm7dx1wRj6jF5X3g7jxl2ZnY3e/jezrxvs0uvez3f2ddt6EDUQn4aSTXV88F/efRyfWN6 3n/7tAeDBz58PiJ178GcXtqeg35WuQCN+9wEv/ICy4DbP/lEjK/HCuBS576B/iG/ykmauRxo kJP5Z/2lhXwpKAdzlczUcr6w5Obw8PHqxJ5rVY8+lna9+jFT9dqLVhkMBoPBYDAYDAaDwdAd aMX/U2uf1lYO7/afvRD+P7Txjvx/1dMMd2zPQjPe5aB8fx7K0e8CLp0P4wDk/4H3J++nNcrI ng2GMtP3C7eluHwDM15vNI962edrbIHLeupCiy7oEO2icEEJ1UaM3eHYHYnd0dgdi91s7I67 jiPFw4R/NOEfS/jHE/5wwh9J+KNOB6lR19FJLLTQfcQOX0MsfdGtuB6UFsvssOwidH7UxKm7 E7S6+XEaDAaDwWAwGAwGg6HL4Xg+dPskiaZmlFdGSfZKvs3FHySf5PLk9Zr9VtknWS15Ob/v 11OPUHSWKuncvcv0XhS8pMwiyS6F88hY+8UyYnvFuGpln9h+McpCUdeMOmYDgFNBOgjdrUq1 n0NiFOfhLlXGI7jjgPsEamEYUmmFqidUIBmDxifGxU5E7RNSnhI7LcYFMTmxM2KTUTtz3Nb+ luC2Q3EFgfwq8iymUJSyhAfYCnYi7dX7cgudejT48FGbz/OQ+7bt5/di1cPz15sseY/r50+6 8Reb+m0X25H6PX4Ytr/PZ3BWyzSu4h7uyO+Wu/aLchfmZEYlV1PBgvjFFv0MyPh8h/j+tDu+ 24niq5/GORkh7+ZQcE9ga/PJ/sf1n+QhGv8XUEsDBBQAAAAIAMOBbCQ09+sWygIAAM4SAAAQ AAAAU1RSRUFNIFA1LTc1LnR4dO2W32/aMBDH3yPlf7iXqptUDPlNprUTLW1VqQwEaNWkvqTJ tYmW2Mh2mvHfz0mgKzRB01RNQuMegLvke3c++yN8HUgsgiVMnI7ngGWOzuFqMoLpYKRrVtcw u35f16Y5hUfOMrhLaMQKAb4Dw/Fs5eqarl18uh+ygqYsiMT9VyzgtzebTy8HozMhOQZZpGtK 2LWv72DCmcRQYgQjFiGoGh2ZZAjwDblIGAWD+J7KzBZLnjzFEj6EH2EaSPUoSGG2FBIzcQI3 NCRg+H6voz7sspe6IDwyXjX5bMLDEoZIaSLgFlHXTnebrhmgxnAKRq8yOF9KFJBQkDGqtGnK ioQ+QYaByDlmSKUgZeErVTEIw5yrmQJHkadSNShjVfcBaRhnAf8BImZ5GqkA4E8Mc7V+XVOp A5A8x6phgaJc/wkENALKpHpWhWNMU8hphKqKCsfqezyrCgPAvBpdbeMF8mpOtTvCDGYLVIOu 7ZJzxitRR9la1Flbo1v7pcgnfUu1GNbhi/Hku2Wuc7gOMcrZdcUq0CPu0Urkb4pc+0VkE69R 1Cfeq0owuxjcXq5FnkWsLZFXigyD9L1XIoDBcLj+2e+Rfn9T5FQim7juK9F8ejN4Eak12Vsi +6gc+jxG8bLNEHCEkGWLgAcPKUKRyFjtPFNvsPrcrE5lgQ8ikai2bYYIn2MpF5+63aIoSCjI c8KfEpoEBKO8WwNzVp3jhD6yMlGdpNrz+XKBcLyiCr4cV++pI7L4MxwP0P5f0Pp2K7Q2sfxm aLdFG9A2i/4dtCbxe23QejaxzbeiA7QHaPcJWsdthdYl6g7WCK1rtELrErfXDK1jtELrELMF WvW/2AatRdwG/kpoXbsVWufNmg7QHqDdN2i3+duAtpm/v4XW3AGtYb4jtLbZCq1HTP8A7QHa PYfW3AGtY70jtLa7A1rXeU9ovVZo1e2h4U69/9D+AlBLAwQUAAAACAAJW3QkYDmE9jwDAABE IwAAHgAAAE5pZ2h0c2hhZGU0NDBCWC0zNTAtNjQudHh0LnR4dO3ZW2/aMBQA4Hck/sN5qbpJ rUniXKu1E5duQyqlKmgXaS9pOG2iQYJsp4z9+jkJdMDiPjCpXTf7AYiT41v86aTNZXIXCx6H E4TLrm0bnc/g2oMODHCWsWWzMUwRqGMMPvyAq34fTKP82cl5s9FsdE++js8HV2dcMAxnk2aj Nxy17Pef4IplAiOBExhksuXrPD0WyQwBPiLjSZaCSQJPxmfzJSsGAK+i13AdCnkqnMJoyQXO +BH004iAGQTGsfywix5H4+vz9gBuMwayL7i34GYJPUzThMMFYrNx+nhpNkyQszstJlIU6CwF ckhSEDHKZqfTbJGkdzDDkOcMZ5gKToqO38kewyjKWSgQGPJ8KuQARSz7vcE0imch+wY8zvLp RFYAfscol/NvNmTTIQiWYzlgjryY/xGE6QTSTMhzZXWM0ynk6QRlL7I6lt/DUdkxAIzLpavK cI6sXKfqUN4mGM1RLnRVzhnLWBl0LMs66Hhdag+r4yLIIi6VQ4yq6u7w6gu1VtdYNiXUkWvX 4usaQg+qICfYDnLthyCPmMZ2kL0Osn4FwajbvjhfX+JQuTnqgigxN3qSpd3rrX5RwyCBvRkk d9gqKDA2gsbX/fY6yLJdYjrbQe5BsejjGPnDbYaQIUTZbB6y8GaKsEhELO98Jq/Iqn2z2pUL vOGJQHnbRojwJhZiftJqLRYLEnFyn7C7JE1CgpO8VYE5K/dxkt5mRUNVI+U9Hy/nCIcrVfD2 sLxObpH5JjoN8J8EaCsBSgaWAqClAqi0JIOoEqBFArceoKUCaAUm8QIFQFcJcHdOJnE0QA3w GQF6jgoglSfpNgurxtIOQJUli/iGCqDlE8er64kWI1AAdDxCvW1L7irId5QAg52srjOgBvjf ZEBfBdD2iaHKgI4KYOAQ6tdnQN9WAXQMYhgaoAb49wD0lRnQsomzw8KqsbTzN6DK0r6PoLYS oEuoUQ/QUwN0ietrgBrgS8+AewHcKwPuBZAqAdY/t2qAGuCzAQxUAKlHgmcHSA0lQIMEpgJg oARIf1OrAWqAL+81xD7/hHnCDKgBaoD6EXQd9PSvITRADfDFANy1tAHwEUt7vYjXGfDPAB7+ BFBLAwQUAAAACACKjG0k2uNLrOAFAAAHXwAAGgAAAFNUUkVBTSBEdWFsIFBJSSAzMDBNSHou dHh07ZxdT+M4FIbvkfgP5wbN7mgwcb6DdmbF8DGLRKcV7SxaaW9C6qERbVLFCV3216+dtEBT HwZClymtfQGqmzdJkzx+zzl2272AfhEOoXN6CpZhtP74Fw54waFjngXe9tZRuwsXcdJPJ1Ak fZbB1x7YxIAuy27iiEEnjK7B2t76nOYDGGdpxDhPMw4sCS+HrL+9tb11uP93t3d+fNCa/vvE 84yFo3658z37ywV0sjRnUc760Er7DM6LZDePRwzgT5bxOE2AEnkuh+n4NouvBjn8Ev0K52Eu 3hJn3r3lORvxD3CaRARoEBi74o8tD10dEL6nGcgPcmPC5S0csSSJOZwxtr318fG2vUWh9Rk+ AjXKBp9vc8YhTiAfMLHb4TCdxMkVjFjIi4yNWJJzIg98Io4YRlGRhTmDjPFimIsTzAfiuJcs iQajMLsGPkiLYV90APuHRUUuL5fYdQh5VrDyhLm4muJDfoAw6UOS5uK9snvAhsPp/QhF90D8 b3fLAwNAr7x0VWuPWVZep+pli42gO2biQlftOMvSrBTtijYT7c6a8mX1WoosYpjiFKOq+7Dd +csyp9uYVNwyU1y7PT7rIcZOJfLteZFrTzehrktc76GIEncmcu5F0D08ODt+IDItlcgmwYMj iXZwdDTbJLCJVROZpcghlv9A1Ds/PbgTeT6x7XkR3ZEXvTdg/O42Q5gxiNLROMwkAjCJBRn5 IBVbpNVzM30qJ+ySxzkTt63LGPw2yPPx/t7eZDIhESc3cXYVJ3FIWL/Yq4D5VD7HcfI9lTuq dlLe897tmMG7KVXw+7tyO/GIjJX0yc73cw16Ib+GVpiEV+JB6rBM6EdhIuhusVz05EWWiEdG HPS9omnENeLTnnvEnUcQV9NqES9AEfeJrxwXGiJu2yjiLrG9tUDcMXfgsPMNCi6o1oxqRqc9 L2M0MFBGbUINbcPPY5RqRjWjyw6VX89HTZRRnxLfXAtGNZIayTcU2poeiqRJqLsoevPZ64ua 3F07YZLhqnIFx/eFqx+qn97k7hZ3+cTEu/1VJ96bNzqZATI6SfydGv7BLD530dFJDCw1R3aa B/UOMRx0dDIWRJuUeIvHWUf1mwIpxSA1DYO4GKQGCinGmxDhBXARnFgqshtC6hgopGLgcdYU 0mbOfHKirVknDis17SWoN1DqTfnmxlqzRlIj+ZNK4MEjSFq+GkkrQJH0CaUaSY2kdsnXdUnL QstrhkZSI7mOSDZKV/UklEZSI7lSSDZbQqkDVzWSS58eQlZQwy70LtrlM1utwxYQyI7ThOey IMX/h1OZ71lCLWx2skD3ywuqR6GVGoXmV1vNTzZRX04qLRAL4Mn6tnoUAl+MNnNjg0H8qch3 kVEIhF3bpkJEDTlxrR6FQAQGdjAvcqciA6tjQ+DWAvxK9AZHoTuwTA3WSoLluihYwvsVZVXJ iIfVpcA3iYOBhdWlpMhwVKIAt3fJCFUwIkWeiYLlE89bC7CW7qnaAlecVNtCSS1dS0mqg1tg GTsrSfXsR0h1laKA+DipXm0guSfVx3JjKaLrQaq2wBUHy6MoWMKYfDVYboCCZRFTFSbWRDWw MJFwMx8Fy6+tmLwHy3ERsMQzsBDFvlGwfmhrT28azJVM+tAVhiLp8xRfsSkdz0TBdGq1nZeA Wc/f6rGpKn8rA1psGYMU1UPnNwqmJmktsrxmFmc9QpK65tIwdnQchCRpcY62uFrTYK5NXfOV kjptcdritMUpLA4nyVooL/oqKOokqaCQIpeiJAVyjlVb3FzTYK5k3dJGwRQsKb6uUVqcj4Jp y0ro8rI4ilucszAtcVe3xBYQYAGrtjhN0k+yuCaT4K+XxW3GXN2SbW7+9cu/pa3n/tZy7q+J h3rYIjx0ar+phwZY/UZGxPZ6pIl67m/V5/5wS6XKlaCNs8bng6WzRp01bm6sa6Bg2vKnR5YH 5vPXpTVa8LkhjqdJWjWSHAclySGW4vtHzbPGVyuM+hZKkifT5HUg6e4nQL7JnwD5D1BLAQIU ABQAAAAIALJadCRcIefY0RkAAACMAAAeAAAAAAAAAAAAIAC2gQAAAABTVFJFQU0gSW4taG91 c2UgQmVuY2htYXJrcy54bHNQSwECFAAUAAAACADDgWwkNPfrFsoCAADOEgAAEAAAAAAAAAAB ACAAtoENGgAAU1RSRUFNIFA1LTc1LnR4dFBLAQIUABQAAAAIAAlbdCRgOYT2PAMAAEQjAAAe AAAAAAAAAAEAIAC2gQUdAABOaWdodHNoYWRlNDQwQlgtMzUwLTY0LnR4dC50eHRQSwECFAAU AAAACACKjG0k2uNLrOAFAAAHXwAAGgAAAAAAAAABACAAtoF9IAAAU1RSRUFNIER1YWwgUElJ IDMwME1Iei50eHRQSwUGAAAAAAQABAAeAQAAlSYAAAAA --------------9CBB6CBFF927207DADFF247A-- From sthaug@nethelp.no Thu Apr 30 13:28:35 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id NAA20536 for ; Thu, 30 Apr 1998 13:28:34 -0700 Received: from giraffe.asd.sgi.com (giraffe.asd.sgi.com [192.26.72.158]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via SMTP id NAA18709253 for ; Thu, 30 Apr 1998 13:28:33 -0700 (PDT) Received: from sgi.sgi.com by giraffe.asd.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI) for id NAA09597; Thu, 30 Apr 1998 13:28:20 -0700 Received: from verdi.nethelp.no (verdi.nethelp.no [195.1.171.130]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id NAA15481 for ; Thu, 30 Apr 1998 13:28:15 -0700 (PDT) mail_from (sthaug@nethelp.no) From: sthaug@nethelp.no Received: (qmail 1383 invoked by uid 1001); 30 Apr 1998 20:28:12 +0000 (GMT) To: mccalpin@cthulhu Cc: havard.eidnes@runit.sintef.no, jarle.greipsland@runit.sintef.no Subject: STREAM results for Pentium II 400 MHz with 440BX chipset & 100 MHz bus X-Mailer: Mew version 1.05+ on Emacs 19.28.2 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Date: Thu, 30 Apr 1998 22:28:12 +0200 Message-ID: <1381.893968092@verdi.nethelp.no> Status: RO Just got my hands on a Pentium II 400 MHz system with 440BX chipset & 100 MHz bus, SDRAM. Quite nice: ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 1000000, Offset = 0 Total memory required = 22.9 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 2 microseconds. Each test below will take on the order of 43152 microseconds. (= 21576 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 304.0377 0.0528 0.0526 0.0542 Scale: 307.9172 0.0521 0.0520 0.0523 Add: 363.3005 0.0661 0.0661 0.0661 Triad: 315.3577 0.0762 0.0761 0.0762 The system is running FreeBSD 3.0-980404-SNAP. Compiling with -O and -O6 gave the same results. Haven't tried changing the OFFSET in stream_d.c. Steinar Haug, Nethelp consulting, sthaug@nethelp.no From jon@minotaur.com Wed May 6 12:23:00 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id MAA04773 for ; Wed, 6 May 1998 12:22:58 -0700 Received: from odin.corp.sgi.com (odin.corp.sgi.com [192.26.51.194]) by cthulhu.engr.sgi.com (980205.SGI.8.8.8/970903.SGI.AUTOCF) via SMTP id MAA21272534 for ; Wed, 6 May 1998 12:22:56 -0700 (PDT) Received: from sgi.sgi.com by odin.corp.sgi.com via ESMTP (951211.SGI.8.6.12.PATCH1502/951211.SGI) for id MAA09971; Wed, 6 May 1998 12:22:48 -0700 Received: from mail.virginia.edu (mail.Virginia.EDU [128.143.2.9]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam) via SMTP id MAA25872 for ; Wed, 6 May 1998 12:22:44 -0700 (PDT) mail_from (jon@minotaur.com) Received: from ares.cs.virginia.edu by mail.virginia.edu id aa18424; 6 May 98 15:22 EDT Received: from minotaur.com (www.minotaur.com [209.70.17.2]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with SMTP id PAA02796 for ; Wed, 6 May 1998 15:21:51 -0400 (EDT) Received: (qmail 5682 invoked from network); 6 May 1998 19:23:49 -0000 Received: from roaming.minotaur.com (HELO roaming) (209.70.17.100) by www.minotaur.com with SMTP; 6 May 1998 19:23:49 -0000 From: "Jon E. Mitchiner" To: mccalpin@cs.virginia.edu MMDF-Warning: Parse error in original version of preceding line at mail.virginia.edu Subject: Stream Results on a PII-350 system Date: Wed, 6 May 1998 15:22:40 -0400 Message-ID: <002401bd7924$56418540$641146d1@roaming.minotaur.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook 8.5, Build 4.71.2173.0 Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V4.72.2106.4 Status: RO a) Intel SE440BX Motherboard with integrated ISA Sound Card running FreeBSD 3.0 Current with 128MB PC100 Corsair Memory on a production/semi-busy system. b) cc -O stream_d.c second_cpu.c -o stream_d -lm c) ww# ./stream_d ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 4500000, Offset = 0 Total memory required = 103.0 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 7812 microseconds. Each test below will take on the order of 250000 microseconds. (= 32 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Test #1: Function Rate (MB/s) RMS time Min time Max time Copy: 271.0588 0.2711 0.2656 0.2734 Scale: 271.0588 0.2672 0.2656 0.2734 Add: 321.4884 0.3422 0.3359 0.3438 Triad: 271.0588 0.3984 0.3984 0.3984 Test #2: Function Rate (MB/s) RMS time Min time Max time Copy: 271.0588 0.2747 0.2656 0.3203 Scale: 279.2727 0.2708 0.2578 0.3203 Add: 321.4884 0.3446 0.3359 0.3594 Triad: 271.0588 0.4008 0.3984 0.4141 Test #3: Function Rate (MB/s) RMS time Min time Max time Copy: 279.2727 0.2664 0.2578 0.2734 Scale: 271.0588 0.2672 0.2656 0.2734 Add: 329.1429 0.3438 0.3281 0.3516 Triad: 271.0588 0.4040 0.3984 0.4141 Test #4: Function Rate (MB/s) RMS time Min time Max time Copy: 279.2727 0.2688 0.2578 0.2812 Scale: 271.0588 0.2680 0.2656 0.2734 Add: 321.4884 0.3438 0.3359 0.3516 Triad: 271.0588 0.4016 0.3984 0.4062 Test #5: Function Rate (MB/s) RMS time Min time Max time Copy: 271.0588 0.2762 0.2656 0.3203 Scale: 279.2727 0.2740 0.2578 0.3203 Add: 321.4884 0.3422 0.3359 0.3516 Triad: 271.0588 0.4024 0.3984 0.4141 Jon ______________________________________________________________________ Jon E. Mitchiner - jon@minotaur.com Minotaur Technologies, LLC - http://www.minotaur.com - (703) 560-0683 (FAX) From mjr19@cus.cam.ac.uk Mon May 18 02:30:43 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA17803 for ; Mon, 18 May 1998 02:30:43 -0700 Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id CAA63611 for ; Mon, 18 May 1998 02:30:43 -0700 (PDT) mail_from (mjr19@cus.cam.ac.uk) Received: from ursa.cus.cam.ac.uk (ursa.cus.cam.ac.uk [131.111.8.6]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id CAA11333 for ; Mon, 18 May 1998 02:30:30 -0700 (PDT) mail_from (mjr19@cus.cam.ac.uk) Received: from mjr19 (helo=localhost) by ursa.cus.cam.ac.uk with local-smtp (Exim 1.91 #1) for mccalpin@frakir.engr.sgi.com id 0ybMFW-0003fN-00; Mon, 18 May 1998 10:30:26 +0100 Date: Mon, 18 May 1998 10:30:26 +0100 (BST) From: "M.J. Rutter" Reply-To: "M.J. Rutter" To: John McCalpin Subject: Stream for S3600 In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO Dear John, Here are some stream results for a Hitachi S3600. It may be (almost) obsolete, but it is still faster than the average desktop... The results are one run with array size of 4,000,000, then array size 20,000,000 offset 0 (twice) offset 8 and offset 10. The triad is the only test which is markedly changed by the offset, and I have not investigated other offsets. If nothing else, this should double the number of Hitachi machines for which you have data! Michael Compiled with "f77 -W0,'OPT(O(S)),HAP' stream_d.f t_second.f" Array sizes increased as I don't trust the timing routine very far. Run on an empty machine as an unprivileged user, except offset != 0 runs, which were on a busy machine. ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 4000000 Offset = 0 The total memory requirement is 91 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 28 microseconds The tests below will each take a time on the order of 7461 microseconds (= 266 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 12407.9100 0.0052 0.0052 0.0052 Scale: 6500.1016 0.0100 0.0098 0.0102 Add: 11320.7547 0.0087 0.0085 0.0091 Triad: 7548.3567 0.0137 0.0127 0.0142 Sum of a is = 6075000000000.00000 Sum of b is = 1215000000000.00000 Sum of c is = 1620000000000.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 20000000 Offset = 0 The total memory requirement is 457 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 28 microseconds The tests below will each take a time on the order of 37276 microseconds (= 1331 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 12782.6156 0.0258 0.0250 0.0261 Scale: 6346.3102 0.0507 0.0504 0.0514 Add: 11202.1284 0.0433 0.0428 0.0435 Triad: 7031.2157 0.0697 0.0683 0.0706 Sum of a is = 30375000000000.0000 Sum of b is = 6075000000000.00000 Sum of c is = 8100000000000.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 20000000 Offset = 0 The total memory requirement is 457 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 27 microseconds The tests below will each take a time on the order of 37321 microseconds (= 1382 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 12700.4286 0.0258 0.0252 0.0262 Scale: 6346.0585 0.0507 0.0504 0.0513 Add: 11183.3368 0.0434 0.0429 0.0440 Triad: 6980.5997 0.0696 0.0688 0.0705 Sum of a is = 30375000000000.0000 Sum of b is = 6075000000000.00000 Sum of c is = 8100000000000.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 20000000 Offset = 8 The total memory requirement is 457 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 28 microseconds The tests below will each take a time on the order of 37263 microseconds (= 1331 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 12235.6900 0.0269 0.0262 0.0274 Scale: 6338.0142 0.0507 0.0505 0.0508 Add: 11437.5581 0.0422 0.0420 0.0424 Triad: 10168.6298 0.0504 0.0472 0.0599 Sum of a is = 30375000000000.0000 Sum of b is = 6075000000000.00000 Sum of c is = 8100000000000.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 20000000 Offset = 10 The total memory requirement is 457 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 28 microseconds The tests below will each take a time on the order of 37404 microseconds (= 1336 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 12062.7262 0.0273 0.0265 0.0288 Scale: 6351.4747 0.0509 0.0504 0.0512 Add: 11291.9921 0.0426 0.0425 0.0428 Triad: 10819.8273 0.0445 0.0444 0.0449 Sum of a is = 30375000000000.0000 Sum of b is = 6075000000000.00000 Sum of c is = 8100000000000.00000 From loriann@cray.com Tue Jun 30 10:33:21 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id KAA20123 for ; Tue, 30 Jun 1998 10:33:20 -0700 Received: from timbuk-fddi.cray.com (timbuk-fddi.cray.com [128.162.8.102]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id KAA07676 for ; Tue, 30 Jun 1998 10:33:19 -0700 (PDT) mail_from (loriann@cray.com) Received: from ironwood-e185.cray.com (root@ironwood-e185.cray.com [128.162.185.212]) by timbuk-fddi.cray.com (8.8.8/CRI-gate-news-1.3) with ESMTP id MAA03895 for ; Tue, 30 Jun 1998 12:33:18 -0500 (CDT) Received: from fir (loriann@fir-fddi [128.162.21.7]) by ironwood-e185.cray.com (8.8.4/CRI-ironwood-news-1.0) with SMTP id MAA05048 for ; Tue, 30 Jun 1998 12:33:16 -0500 (CDT) From: Lori Gilbertson Received: by fir (SMI-8.6/btd-b3) id MAA04998; Tue, 30 Jun 1998 12:33:15 -0500 Message-Id: <199806301733.MAA04998@fir.cray.com> Date: Tue, 30 Jun 1998 12:33:15 -0500 To: mccalpin@cthulhu Subject: 225mhz o200 stream test Status: RO Here's the log file from the run - please let me know if it looks okay. The machine is bsw-1.cray.com - root passwd is monday33 if you need to login there to see anything. I have the stream stuff in /tmp/stream. I ran it twice - once in single-user mode and once in multi but the numbers look comparable to me. Here is the multi-cpu log: Running on 1 cpus 3 trials with N=2e6 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 99483 microseconds (= 99483 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 294.45 0.1087 0.1225 0.1198 0.1198 0.1208 Scale: 302.08 0.1059 0.1167 0.1156 0.1156 0.1167 Add: 317.36 0.1512 0.1515 0.1514 0.1514 0.1514 Triad: 313.72 0.1530 0.1540 0.1537 0.1537 0.1537 ----------------------------------------------------------------------------- All times are 0.1087 0.1059 0.1514 0.1537 0.1208 0.1164 0.1514 0.1540 0.1207 0.1166 0.1515 0.1537 0.1208 0.1167 0.1512 0.1539 0.1209 0.1167 0.1514 0.1538 0.1207 0.1167 0.1515 0.1537 0.1210 0.1166 0.1513 0.1536 0.1208 0.1167 0.1513 0.1540 0.1208 0.1167 0.1513 0.1530 0.1225 0.1167 0.1515 0.1539 ----------------------------------------------------------------------------- Sum of a is = 115330078125.0000 Sum of b is = 23066015625.00000 Sum of c is = 30754687500.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 106125 microseconds (= 106125 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 298.12 0.1073 0.1205 0.1190 0.1190 0.1201 Scale: 301.36 0.1062 0.1168 0.1155 0.1156 0.1166 Add: 317.51 0.1512 0.1517 0.1513 0.1513 0.1512 Triad: 314.03 0.1529 0.1532 0.1530 0.1530 0.1531 ----------------------------------------------------------------------------- All times are 0.1073 0.1062 0.1512 0.1529 0.1205 0.1165 0.1517 0.1532 0.1202 0.1164 0.1513 0.1532 0.1203 0.1165 0.1515 0.1530 0.1200 0.1165 0.1512 0.1531 0.1202 0.1167 0.1512 0.1530 0.1202 0.1165 0.1515 0.1529 0.1204 0.1166 0.1512 0.1530 0.1202 0.1168 0.1513 0.1530 0.1203 0.1165 0.1513 0.1530 ----------------------------------------------------------------------------- Sum of a is = 115330078125.0000 Sum of b is = 23066015625.00000 Sum of c is = 30754687500.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 103388 microseconds (= 103388 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 299.79 0.1067 0.1202 0.1186 0.1187 0.1200 Scale: 307.58 0.1040 0.1162 0.1148 0.1149 0.1160 Add: 316.79 0.1515 0.1518 0.1517 0.1517 0.1516 Triad: 315.79 0.1520 0.1525 0.1522 0.1522 0.1524 ----------------------------------------------------------------------------- All times are 0.1067 0.1040 0.1518 0.1521 0.1202 0.1159 0.1516 0.1521 0.1198 0.1161 0.1516 0.1525 0.1200 0.1160 0.1517 0.1520 0.1202 0.1159 0.1516 0.1524 0.1199 0.1162 0.1515 0.1524 0.1198 0.1160 0.1518 0.1521 0.1201 0.1159 0.1518 0.1523 0.1198 0.1162 0.1517 0.1523 0.1201 0.1161 0.1518 0.1521 ----------------------------------------------------------------------------- Sum of a is = 115330078125.0000 Sum of b is = 23066015625.00000 Sum of c is = 30754687500.00000 3 trials with N=5e6 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 288140 microseconds (= 288140 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 279.58 0.3000 0.3379 0.3302 0.3306 0.3378 Scale: 288.93 0.2903 0.3223 0.3158 0.3160 0.3221 Add: 312.10 0.4032 0.4042 0.4034 0.4034 0.4032 Triad: 272.97 0.4610 0.4618 0.4612 0.4612 0.4618 ----------------------------------------------------------------------------- All times are 0.3000 0.2903 0.4042 0.4610 0.3377 0.3223 0.4033 0.4610 0.3378 0.3221 0.4032 0.4618 0.3379 0.3223 0.4032 0.4611 0.3379 0.3219 0.4033 0.4610 ----------------------------------------------------------------------------- Sum of a is = 303750.0000000000 Sum of b is = 60750.00000000000 Sum of c is = 81000.00000000000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 273045 microseconds (= 273045 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 281.40 0.2981 0.3358 0.3282 0.3286 0.3357 Scale: 292.31 0.2870 0.3196 0.3130 0.3133 0.3196 Add: 314.48 0.4001 0.4009 0.4003 0.4003 0.4003 Triad: 275.14 0.4573 0.4576 0.4574 0.4574 0.4573 ----------------------------------------------------------------------------- All times are 0.2981 0.2870 0.4009 0.4574 0.3358 0.3196 0.4002 0.4576 0.3357 0.3196 0.4003 0.4573 0.3357 0.3196 0.4003 0.4574 0.3358 0.3195 0.4001 0.4574 ----------------------------------------------------------------------------- Sum of a is = 303750.0000000000 Sum of b is = 60750.00000000000 Sum of c is = 81000.00000000000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 283174 microseconds (= 283174 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 278.61 0.3011 0.3386 0.3310 0.3313 0.3383 Scale: 287.68 0.2916 0.3231 0.3167 0.3170 0.3231 Add: 312.61 0.4025 0.4036 0.4028 0.4028 0.4027 Triad: 272.80 0.4613 0.4616 0.4614 0.4614 0.4615 ----------------------------------------------------------------------------- All times are 0.3011 0.2916 0.4036 0.4613 0.3386 0.3229 0.4027 0.4614 0.3383 0.3231 0.4027 0.4615 0.3386 0.3230 0.4025 0.4614 0.3383 0.3230 0.4025 0.4616 ----------------------------------------------------------------------------- Sum of a is = 303750.0000000000 Sum of b is = 60750.00000000000 Sum of c is = 81000.00000000000 Running on 2 cpus 3 trials with N=2e6 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 67243 microseconds (= 67243 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 322.23 0.0993 0.1050 0.1043 0.1043 0.1050 Scale: 329.39 0.0971 0.1061 0.1045 0.1045 0.1043 Add: 356.41 0.1347 0.1355 0.1352 0.1352 0.1350 Triad: 355.75 0.1349 0.1360 0.1351 0.1351 0.1355 ----------------------------------------------------------------------------- All times are 0.0993 0.0971 0.1354 0.1351 0.1049 0.1056 0.1355 0.1349 0.1049 0.1057 0.1354 0.1350 0.1049 0.1054 0.1353 0.1349 0.1049 0.1025 0.1347 0.1360 0.1050 0.1061 0.1352 0.1349 0.1048 0.1057 0.1352 0.1350 0.1048 0.1055 0.1352 0.1350 0.1049 0.1054 0.1352 0.1349 0.1048 0.1057 0.1351 0.1350 ----------------------------------------------------------------------------- Sum of a is = 57665039062.50000 Sum of b is = 11533007812.50000 Sum of c is = 15377343750.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 67626 microseconds (= 67626 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 321.64 0.0995 0.1054 0.1047 0.1047 0.1052 Scale: 327.52 0.0977 0.1058 0.1048 0.1048 0.1056 Add: 354.48 0.1354 0.1359 0.1357 0.1357 0.1357 Triad: 357.62 0.1342 0.1348 0.1345 0.1345 0.1344 ----------------------------------------------------------------------------- All times are 0.0995 0.0977 0.1359 0.1347 0.1052 0.1055 0.1359 0.1344 0.1052 0.1056 0.1355 0.1346 0.1052 0.1055 0.1357 0.1343 0.1052 0.1057 0.1356 0.1347 0.1052 0.1055 0.1358 0.1342 0.1052 0.1057 0.1355 0.1348 0.1053 0.1054 0.1358 0.1342 0.1053 0.1058 0.1354 0.1346 0.1054 0.1055 0.1358 0.1347 ----------------------------------------------------------------------------- Sum of a is = 57665039062.50000 Sum of b is = 11533007812.50000 Sum of c is = 15377343750.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 63427 microseconds (= 63427 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 326.64 0.0980 0.1050 0.1042 0.1043 0.1049 Scale: 332.20 0.0963 0.1056 0.1046 0.1046 0.1055 Add: 354.67 0.1353 0.1357 0.1355 0.1355 0.1355 Triad: 356.94 0.1345 0.1348 0.1346 0.1346 0.1346 ----------------------------------------------------------------------------- All times are 0.0980 0.0963 0.1357 0.1348 0.1049 0.1055 0.1355 0.1346 0.1049 0.1055 0.1357 0.1346 0.1050 0.1056 0.1354 0.1346 0.1049 0.1054 0.1356 0.1345 0.1049 0.1056 0.1353 0.1346 0.1050 0.1054 0.1357 0.1345 0.1048 0.1056 0.1354 0.1345 0.1049 0.1055 0.1354 0.1346 0.1050 0.1055 0.1356 0.1345 ----------------------------------------------------------------------------- Sum of a is = 57665039062.50000 Sum of b is = 11533007812.50000 Sum of c is = 15377343750.00000 3 trials with N=5e6 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 179947 microseconds (= 179947 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 331.26 0.2532 0.2752 0.2706 0.2708 0.2750 Scale: 332.00 0.2527 0.2776 0.2719 0.2721 0.2765 Add: 356.45 0.3530 0.3544 0.3538 0.3538 0.3544 Triad: 360.48 0.3491 0.3500 0.3496 0.3496 0.3491 ----------------------------------------------------------------------------- All times are 0.2532 0.2527 0.3536 0.3492 0.2748 0.2764 0.3542 0.3500 0.2750 0.2765 0.3544 0.3491 0.2752 0.2765 0.3530 0.3499 0.2749 0.2776 0.3538 0.3496 ----------------------------------------------------------------------------- Sum of a is = 151875.0000000000 Sum of b is = 30375.00000000000 Sum of c is = 40500.00000000000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 173858 microseconds (= 173858 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 330.73 0.2536 0.2746 0.2703 0.2704 0.2744 Scale: 334.70 0.2506 0.2762 0.2709 0.2711 0.2760 Add: 355.75 0.3537 0.3541 0.3539 0.3539 0.3537 Triad: 360.47 0.3491 0.3492 0.3492 0.3492 0.3491 ----------------------------------------------------------------------------- All times are 0.2536 0.2506 0.3538 0.3492 0.2744 0.2760 0.3540 0.3491 0.2744 0.2760 0.3537 0.3491 0.2746 0.2758 0.3541 0.3492 0.2744 0.2762 0.3540 0.3492 ----------------------------------------------------------------------------- Sum of a is = 151875.0000000000 Sum of b is = 30375.00000000000 Sum of c is = 40500.00000000000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 176200 microseconds (= 176200 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 332.77 0.2521 0.2747 0.2699 0.2700 0.2747 Scale: 333.43 0.2516 0.2765 0.2714 0.2716 0.2765 Add: 355.72 0.3537 0.3540 0.3539 0.3539 0.3537 Triad: 369.26 0.3408 0.3497 0.3477 0.3478 0.3497 ----------------------------------------------------------------------------- All times are 0.2521 0.2516 0.3540 0.3494 0.2741 0.2763 0.3540 0.3494 0.2747 0.2765 0.3537 0.3497 0.2743 0.2765 0.3539 0.3494 0.2742 0.2762 0.3539 0.3408 ----------------------------------------------------------------------------- Sum of a is = 151875.0000000000 Sum of b is = 30375.00000000000 Sum of c is = 40500.00000000000 Running on 4 cpus 3 trials with N=2e6 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 37289 microseconds (= 37289 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 609.11 0.0525 0.0707 0.0563 0.0567 0.0527 Scale: 624.52 0.0512 0.0702 0.0569 0.0572 0.0537 Add: 676.15 0.0710 0.0916 0.0752 0.0757 0.0713 Triad: 703.82 0.0682 0.0892 0.0727 0.0731 0.0687 ----------------------------------------------------------------------------- All times are 0.0545 0.0512 0.0712 0.0683 0.0527 0.0557 0.0710 0.0688 0.0526 0.0539 0.0713 0.0686 0.0527 0.0538 0.0710 0.0682 0.0528 0.0537 0.0710 0.0686 0.0526 0.0538 0.0716 0.0687 0.0526 0.0539 0.0711 0.0686 0.0525 0.0538 0.0712 0.0685 0.0694 0.0688 0.0916 0.0892 0.0707 0.0702 0.0915 0.0892 ----------------------------------------------------------------------------- Sum of a is = 38443378596.67969 Sum of b is = 7688675719.335938 Sum of c is = 10251567625.78125 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 36693 microseconds (= 36693 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 590.64 0.0542 0.0708 0.0564 0.0566 0.0544 Scale: 629.56 0.0508 0.0714 0.0575 0.0578 0.0554 Add: 654.40 0.0733 0.0940 0.0775 0.0778 0.0735 Triad: 675.10 0.0711 0.0940 0.0755 0.0759 0.0715 ----------------------------------------------------------------------------- All times are 0.0552 0.0508 0.0737 0.0712 0.0546 0.0555 0.0733 0.0717 0.0564 0.0553 0.0735 0.0712 0.0544 0.0553 0.0742 0.0711 0.0544 0.0555 0.0735 0.0717 0.0543 0.0554 0.0734 0.0712 0.0542 0.0554 0.0738 0.0711 0.0547 0.0560 0.0735 0.0716 0.0547 0.0647 0.0940 0.0940 0.0708 0.0714 0.0916 0.0898 ----------------------------------------------------------------------------- Sum of a is = 38443378596.67969 Sum of b is = 7688675719.335938 Sum of c is = 10251567625.78125 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 36638 microseconds (= 36638 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 581.55 0.0550 0.0709 0.0583 0.0585 0.0551 Scale: 613.44 0.0522 0.0707 0.0590 0.0592 0.0592 Add: 648.45 0.0740 0.0928 0.0780 0.0783 0.0741 Triad: 678.33 0.0708 0.0917 0.0751 0.0755 0.0711 ----------------------------------------------------------------------------- All times are 0.0565 0.0522 0.0749 0.0710 0.0553 0.0574 0.0742 0.0710 0.0558 0.0566 0.0748 0.0715 0.0552 0.0562 0.0748 0.0709 0.0550 0.0603 0.0740 0.0710 0.0553 0.0582 0.0742 0.0713 0.0579 0.0562 0.0746 0.0708 0.0559 0.0565 0.0741 0.0724 0.0651 0.0658 0.0928 0.0917 0.0709 0.0707 0.0915 0.0892 ----------------------------------------------------------------------------- Sum of a is = 38443378596.67969 Sum of b is = 7688675719.335938 Sum of c is = 10251567625.78125 3 trials with N=5e6 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 90695 microseconds (= 90695 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 605.41 0.1386 0.1856 0.1592 0.1606 0.1434 Scale: 641.87 0.1307 0.1853 0.1560 0.1575 0.1425 Add: 660.28 0.1906 0.2410 0.2205 0.2218 0.2410 Triad: 691.04 0.1821 0.2340 0.2128 0.2143 0.2340 ----------------------------------------------------------------------------- All times are 0.1386 0.1307 0.1907 0.1821 0.1428 0.1438 0.1906 0.1821 0.1434 0.1425 0.2410 0.2340 0.1856 0.1778 0.2409 0.2330 0.1856 0.1853 0.2393 0.2328 ----------------------------------------------------------------------------- Sum of a is = 101250.0193119049 Sum of b is = 20250.00386238098 Sum of c is = 27000.00514984131 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 89745 microseconds (= 89745 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 632.41 0.1326 0.1867 0.1477 0.1491 0.1399 Scale: 662.86 0.1266 0.1860 0.1457 0.1471 0.1390 Add: 695.31 0.1810 0.2402 0.1931 0.1945 0.1812 Triad: 721.73 0.1743 0.2351 0.1980 0.2001 0.1743 ----------------------------------------------------------------------------- All times are 0.1326 0.1266 0.1810 0.1750 0.1399 0.1387 0.1813 0.1748 0.1399 0.1390 0.1812 0.1743 0.1396 0.1381 0.1818 0.2308 0.1867 0.1860 0.2402 0.2351 ----------------------------------------------------------------------------- Sum of a is = 101250.0193119049 Sum of b is = 20250.00386238098 Sum of c is = 27000.00514984131 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 99430 microseconds (= 99430 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 609.53 0.1376 0.1850 0.1579 0.1592 0.1428 Scale: 612.42 0.1370 0.1862 0.1599 0.1612 0.1471 Add: 651.75 0.1931 0.2515 0.2143 0.2158 0.1931 Triad: 681.27 0.1847 0.2362 0.2055 0.2070 0.1862 ----------------------------------------------------------------------------- All times are 0.1376 0.1370 0.1936 0.1851 0.1437 0.1471 0.1931 0.1847 0.1428 0.1471 0.1931 0.1862 0.1804 0.1821 0.2515 0.2362 0.1850 0.1862 0.2401 0.2353 ----------------------------------------------------------------------------- Sum of a is = 101250.0193119049 Sum of b is = 20250.00386238098 Sum of c is = 27000.00514984131 Now running 2 threads -- 1 per node 3 trials with N=2e6 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 37724 microseconds (= 37724 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 563.74 0.0568 0.0602 0.0598 0.0598 0.0602 Scale: 569.12 0.0562 0.0618 0.0610 0.0611 0.0615 Add: 591.83 0.0811 0.0833 0.0814 0.0814 0.0812 Triad: 588.59 0.0816 0.0819 0.0817 0.0817 0.0817 ----------------------------------------------------------------------------- All times are 0.0568 0.0562 0.0812 0.0818 0.0602 0.0615 0.0812 0.0817 0.0601 0.0618 0.0812 0.0819 0.0601 0.0616 0.0812 0.0817 0.0602 0.0616 0.0812 0.0817 0.0602 0.0615 0.0813 0.0817 0.0602 0.0615 0.0811 0.0819 0.0602 0.0615 0.0811 0.0816 0.0601 0.0615 0.0815 0.0816 0.0602 0.0615 0.0833 0.0817 ----------------------------------------------------------------------------- Sum of a is = 57665039062.50000 Sum of b is = 11533007812.50000 Sum of c is = 15377343750.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 40005 microseconds (= 40005 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 572.21 0.0559 0.0603 0.0598 0.0598 0.0602 Scale: 577.41 0.0554 0.0612 0.0605 0.0606 0.0611 Add: 592.53 0.0810 0.0814 0.0811 0.0811 0.0811 Triad: 598.38 0.0802 0.0805 0.0803 0.0803 0.0803 ----------------------------------------------------------------------------- All times are 0.0559 0.0554 0.0810 0.0804 0.0603 0.0611 0.0811 0.0803 0.0602 0.0611 0.0813 0.0803 0.0601 0.0612 0.0811 0.0805 0.0602 0.0612 0.0811 0.0803 0.0602 0.0611 0.0811 0.0803 0.0602 0.0611 0.0814 0.0804 0.0603 0.0611 0.0812 0.0802 0.0602 0.0611 0.0810 0.0804 0.0603 0.0610 0.0811 0.0803 ----------------------------------------------------------------------------- Sum of a is = 57665039062.50000 Sum of b is = 11533007812.50000 Sum of c is = 15377343750.00000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 36931 microseconds (= 36931 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 548.72 0.0583 0.0604 0.0600 0.0600 0.0603 Scale: 582.02 0.0550 0.0616 0.0608 0.0608 0.0614 Add: 591.78 0.0811 0.0815 0.0812 0.0812 0.0814 Triad: 596.88 0.0804 0.0806 0.0805 0.0805 0.0804 ----------------------------------------------------------------------------- All times are 0.0583 0.0550 0.0812 0.0805 0.0601 0.0615 0.0811 0.0805 0.0601 0.0616 0.0812 0.0805 0.0601 0.0614 0.0812 0.0805 0.0601 0.0614 0.0814 0.0804 0.0604 0.0613 0.0815 0.0805 0.0604 0.0613 0.0811 0.0805 0.0601 0.0615 0.0811 0.0805 0.0601 0.0614 0.0812 0.0806 0.0603 0.0614 0.0812 0.0804 ----------------------------------------------------------------------------- Sum of a is = 57665039062.50000 Sum of b is = 11533007812.50000 Sum of c is = 15377343750.00000 3 trials with N=5e6 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 132988 microseconds (= 132988 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 589.98 0.1422 0.1556 0.1527 0.1528 0.1553 Scale: 578.73 0.1449 0.1595 0.1565 0.1566 0.1595 Add: 603.20 0.2086 0.2093 0.2091 0.2091 0.2093 Triad: 600.18 0.2097 0.2099 0.2098 0.2098 0.2099 ----------------------------------------------------------------------------- All times are 0.1422 0.1449 0.2086 0.2099 0.1552 0.1595 0.2093 0.2098 0.1553 0.1595 0.2093 0.2099 0.1552 0.1593 0.2093 0.2099 0.1556 0.1591 0.2092 0.2097 ----------------------------------------------------------------------------- Sum of a is = 151875.0000000000 Sum of b is = 30375.00000000000 Sum of c is = 40500.00000000000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 129742 microseconds (= 129742 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 584.43 0.1435 0.3380 0.2620 0.2776 0.3345 Scale: 592.58 0.1416 0.3224 0.2519 0.2654 0.3148 Add: 612.03 0.2056 0.4013 0.2851 0.3004 0.4009 Triad: 607.11 0.2073 0.4597 0.3101 0.3332 0.4597 ----------------------------------------------------------------------------- All times are 0.1435 0.1416 0.2056 0.2073 0.1561 0.1583 0.2057 0.2091 0.3345 0.3148 0.4009 0.4597 0.3380 0.3223 0.4013 0.4591 0.3377 0.3224 0.2121 0.2152 ----------------------------------------------------------------------------- Sum of a is = 151875.0000000000 Sum of b is = 30375.00000000000 Sum of c is = 40500.00000000000 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5242880 Offset = 255 The total memory requirement is 120 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 134463 microseconds (= 134463 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 594.46 0.1411 0.1557 0.1527 0.1528 0.1555 Scale: 588.73 0.1425 0.1596 0.1559 0.1561 0.1594 Add: 604.24 0.2082 0.2092 0.2085 0.2085 0.2084 Triad: 601.58 0.2092 0.2100 0.2095 0.2095 0.2095 ----------------------------------------------------------------------------- All times are 0.1411 0.1425 0.2084 0.2092 0.1557 0.1596 0.2092 0.2100 0.1555 0.1594 0.2084 0.2095 0.1556 0.1590 0.2084 0.2093 0.1554 0.1590 0.2082 0.2094 ----------------------------------------------------------------------------- Sum of a is = 151875.0000000000 Sum of b is = 30375.00000000000 Sum of c is = 40500.00000000000 From mccalpin Thu Jul 9 15:57:48 1998 Received: (from mccalpin@localhost) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id PAA03282 for mccalpin; Thu, 9 Jul 1998 15:57:48 -0700 From: "John McCalpin" Message-Id: <9807091557.ZM3280@frakir.engr.sgi.com> Date: Thu, 9 Jul 1998 15:57:47 -0700 X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: mccalpin Subject: STREAM on 225QC - 1 cpu Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Status: RO bsw-1 7# ./stream 100149D8 10F56DD8 11E991D8 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 4 microseconds The tests below will each take a time on the order of 90731 microseconds (= 22683 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 290.99 0.1100 0.1130 0.1125 0.1125 0.1128 Scale: 304.66 0.1050 0.1173 0.1160 0.1161 0.1172 Add: 320.37 0.1498 0.1501 0.1500 0.1500 0.1499 Triad: 319.05 0.1504 0.1506 0.1506 0.1506 0.1505 ----------------------------------------------------------------------------- Sum of a is = 115330078125.0000 Sum of b is = 23066015625.00000 Sum of c is = 30754687500.00000 a(1),a(n) = 1153300781250.000 1153300781250.000 b(1),b(n) = 230660156250.0000 230660156250.0000 c(1),c(n) = 307546875000.0000 307546875000.0000 -- -- John D. McCalpin, Ph.D. Server System Architect Server Platform Engineering http://reality.sgi.com/mccalpin/ Silicon Graphics, Inc. mccalpin@sgi.com 650-933-7407 From mccalpin Thu Jul 9 16:16:54 1998 Received: (from mccalpin@localhost) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA03359 for mccalpin; Thu, 9 Jul 1998 16:16:54 -0700 From: "John McCalpin" Message-Id: <9807091616.ZM3357@frakir.engr.sgi.com> Date: Thu, 9 Jul 1998 16:16:54 -0700 X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: mccalpin Subject: 225QC/2-cpu Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Status: RO bsw-1 2# setenv MP_SET_NUMTHREADS 2 bsw-1 3# ./stream.mp.4e6 10036960 11EBB160 13D3FD28 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 4000000 Offset = 0 The total memory requirement is 91 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 4 microseconds The tests below will each take a time on the order of 134646 microseconds (= 33662 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 331.95 0.1928 0.2065 0.2039 0.2040 0.2062 Scale: 336.37 0.1903 0.2085 0.2054 0.2055 0.2077 Add: 373.28 0.2572 0.2609 0.2592 0.2592 0.2592 Triad: 364.67 0.2633 0.2649 0.2637 0.2637 0.2634 ----------------------------------------------------------------------------- Sum of a is = 57665039062.50000 Sum of b is = 11533007812.50000 Sum of c is = 15377343750.00000 a(1),a(n) = 1153300781250.000 1153300781250.000 b(1),b(n) = 230660156250.0000 230660156250.0000 c(1),c(n) = 307546875000.0000 307546875000.0000 bsw-1 4# !! ./stream.mp.4e6 10036960 11EBB160 13D3FD28 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 4000000 Offset = 0 The total memory requirement is 91 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 4 microseconds The tests below will each take a time on the order of 134403 microseconds (= 33601 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 333.35 0.1920 0.2063 0.2048 0.2048 0.2062 Scale: 337.55 0.1896 0.2078 0.2053 0.2053 0.2069 Add: 373.45 0.2571 0.2591 0.2588 0.2588 0.2590 Triad: 364.63 0.2633 0.2635 0.2633 0.2633 0.2633 ----------------------------------------------------------------------------- Sum of a is = 57665039062.50000 Sum of b is = 11533007812.50000 Sum of c is = 15377343750.00000 a(1),a(n) = 1153300781250.000 1153300781250.000 b(1),b(n) = 230660156250.0000 230660156250.0000 c(1),c(n) = 307546875000.0000 307546875000.0000 -- -- John D. McCalpin, Ph.D. Server System Architect Server Platform Engineering http://reality.sgi.com/mccalpin/ Silicon Graphics, Inc. mccalpin@sgi.com 650-933-7407 From mccalpin Thu Jul 9 16:20:45 1998 Received: (from mccalpin@localhost) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) id QAA03383 for mccalpin; Thu, 9 Jul 1998 16:20:45 -0700 From: "John McCalpin" Message-Id: <9807091620.ZM3381@frakir.engr.sgi.com> Date: Thu, 9 Jul 1998 16:20:45 -0700 X-Mailer: Z-Mail (3.2.3 08feb96 MediaMail) To: mccalpin Subject: 225QC in L2 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Status: RO bsw-1 6# f77 -n32 -mips4 -Ofast=ip27 stream.f second.o -o stream.small bsw-1 7# ./stream.small 100149D8 1004F358 10089CD8 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 30000 Offset = 0 The total memory requirement is 0 MB You are running each test 20 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 4 microseconds The tests below will each take a time on the order of 306 microseconds (= 77 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 1030.04 0.0005 0.0005 0.0005 0.0005 0.0005 Scale: 1085.97 0.0004 0.0005 0.0004 0.0004 0.0004 Add: 1089.26 0.0007 0.0007 0.0007 0.0007 0.0007 Triad: 1182.27 0.0006 0.0006 0.0006 0.0006 0.0006 ----------------------------------------------------------------------------- Sum of a is = 3.3252567300810404E+22 Sum of b is = 6.6505134601593369E+21 Sum of c is = 8.8673512802117495E+21 a(1),a(n) = 6.6505134601593012E+23 6.6505134601593012E+23 b(1),b(n) = 1.3301026920318602E+23 1.3301026920318602E+23 c(1),c(n) = 1.7734702560424803E+23 1.7734702560424803E+23 bsw-1 8# ^P ^P - Command not found bsw-1 9# ./stream.small 100149D8 1004F358 10089CD8 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 30000 Offset = 0 The total memory requirement is 0 MB You are running each test 20 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 4 microseconds The tests below will each take a time on the order of 299 microseconds (= 75 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 1025.64 0.0005 0.0005 0.0005 0.0005 0.0005 Scale: 1085.97 0.0004 0.0005 0.0004 0.0004 0.0004 Add: 1089.26 0.0007 0.0007 0.0007 0.0007 0.0007 Triad: 1184.21 0.0006 0.0006 0.0006 0.0006 0.0006 ----------------------------------------------------------------------------- Sum of a is = 3.3252567300810404E+22 Sum of b is = 6.6505134601593369E+21 Sum of c is = 8.8673512802117495E+21 a(1),a(n) = 6.6505134601593012E+23 6.6505134601593012E+23 b(1),b(n) = 1.3301026920318602E+23 1.3301026920318602E+23 c(1),c(n) = 1.7734702560424803E+23 1.7734702560424803E+23 bsw-1 10# !! ./stream.small 100149D8 1004F358 10089CD8 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 30000 Offset = 0 The total memory requirement is 0 MB You are running each test 20 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 4 microseconds The tests below will each take a time on the order of 304 microseconds (= 76 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 1032.26 0.0005 0.0005 0.0005 0.0005 0.0005 Scale: 1088.44 0.0004 0.0005 0.0004 0.0004 0.0004 Add: 1089.26 0.0007 0.0007 0.0007 0.0007 0.0007 Triad: 1182.27 0.0006 0.0006 0.0006 0.0006 0.0006 ----------------------------------------------------------------------------- Sum of a is = 3.3252567300810404E+22 Sum of b is = 6.6505134601593369E+21 Sum of c is = 8.8673512802117495E+21 a(1),a(n) = 6.6505134601593012E+23 6.6505134601593012E+23 b(1),b(n) = 1.3301026920318602E+23 1.3301026920318602E+23 c(1),c(n) = 1.7734702560424803E+23 1.7734702560424803E+23 -- -- John D. McCalpin, Ph.D. Server System Architect Server Platform Engineering http://reality.sgi.com/mccalpin/ Silicon Graphics, Inc. mccalpin@sgi.com 650-933-7407 From ian@masuma.com Fri Jul 3 02:03:22 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA04833 for ; Fri, 3 Jul 1998 02:03:21 -0700 Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id CAA03887 for ; Fri, 3 Jul 1998 02:03:20 -0700 (PDT) mail_from (ian@masuma.com) Received: from mail.virginia.edu (mail.Virginia.EDU [128.143.2.9]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via SMTP id CAA05567 for ; Fri, 3 Jul 1998 02:03:14 -0700 (PDT) mail_from (ian@masuma.com) From: ian@masuma.com Received: from ares.cs.virginia.edu by mail.virginia.edu id aa29448; 3 Jul 98 5:03 EDT Received: from wianui (masuma.demon.co.uk [193.237.211.86]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with SMTP id FAA10405 for ; Fri, 3 Jul 1998 05:03:11 -0400 (EDT) Received: by wianui (SMI-8.6/SMI-SVR4) id KAA01978; Fri, 3 Jul 1998 10:00:44 +0100 Date: Fri, 3 Jul 1998 10:00:44 +0100 Message-Id: <199807030900.KAA01978@wianui> To: mccalpin@cs.virginia.edu Subject: Streams results X-Sun-Charset: US-ASCII Status: RO Here are some test results (stream_d) for a Sun SparcEngine ATx UltraIIi system at 300Mhz: Built with cc -fast (SC4.2) ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 1000000, Offset = 0 Total memory required = 22.9 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 58495 microseconds. (= 58495 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 192.1043 0.0848 0.0833 0.0918 Scale: 192.0584 0.0856 0.0833 0.0954 Add: 213.8258 0.1130 0.1122 0.1136 Triad: 213.2880 0.1136 0.1125 0.1163 Built with gcc -O2 (2.7.3) ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 1000000, Offset = 0 Total memory required = 22.9 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 59152 microseconds. (= 59152 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 184.7724 0.0884 0.0866 0.0951 Scale: 185.1294 0.0876 0.0864 0.0933 Add: 200.2236 0.1209 0.1199 0.1234 Triad: 198.5029 0.1225 0.1209 0.1248 Worse than I would have expected, compared to an Ultra2. Ian Collins. From paull-dl@ee.uwa.edu.au Tue Aug 4 02:56:23 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id CAA03476 for ; Tue, 4 Aug 1998 02:56:22 -0700 Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id CAA18668 for ; Tue, 4 Aug 1998 02:56:20 -0700 (PDT) mail_from (paull-dl@ee.uwa.edu.au) Received: from mail.virginia.edu (mail.Virginia.EDU [128.143.2.9]) by sgi.sgi.com (980309.SGI.8.8.8-aspam-6.2/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via SMTP id CAA03607 for ; Tue, 4 Aug 1998 02:56:18 -0700 (PDT) mail_from (paull-dl@ee.uwa.edu.au) Received: from mail.cs.virginia.edu by mail.virginia.edu id aa21584; 4 Aug 98 5:56 EDT Received: from swanee.ee.uwa.edu.au (swanee.ee.uwa.edu.au [130.95.208.1]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with ESMTP id FAA24102 for ; Tue, 4 Aug 1998 05:56:04 -0400 (EDT) Received: from janus.ee.uwa.edu.au (paull-dl@janus.ee.uwa.edu.au [130.95.72.65]) by swanee.ee.uwa.edu.au (8.8.8/8.8.8) with ESMTP id SAA03662 for ; Tue, 4 Aug 1998 18:00:40 +0800 (WST) Received: from localhost by janus.ee.uwa.edu.au (8.8.5/CIIPS-1.0) id RAA23480; Tue, 4 Aug 1998 17:55:59 +0800 Date: Tue, 4 Aug 1998 17:55:59 +0800 (WST) From: Daniel Loyd Paull To: mccalpin@cs.virginia.edu Subject: Stream benchmark Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO John, In case no-one else has commented on using Stream under Linux, I can assure you that it compiled out of the box. As this was done on an Intel based system, I ran the DOS version to confirm the results (included below). Note: I bumped up the size of the array under Linux to get more than 20 clock ticks per test. Copiled with: $ cc -o stream stream_d.c second_cpu.c -lm The system: P120 PC Partner Mother Board (I dunno what release... lost the manual) 64MB EDORAM (60ns I think) 256KB Cache As you can see, the results are almost identical. Cheers, Daniel Paull 4th Year BE(IT) University of Western Australia The DOS version output: STREAM for DOS v2 by Dennis Lee =============================== 1 MB = 1000000 Bytes in the following measurements. For accurate results, this benchmark should be executed in a true DOS session, and not a DOS shell under another OS. Time Operation Mem Speed Error ---- --------- --------- ----- 7.41 sec COPY32 86.37 MB/s 0.8% 7.42 sec COPY64 86.25 MB/s 0.8% 6.37 sec SCALE 100.47 MB/s 1.0% 8.68 sec ADD 110.60 MB/s 0.7% 8.73 sec TRIAD 109.97 MB/s 0.7% These results are comparable with those on the STREAM website. See for info on STREAM. ------------------------------------------------------------------------ When compiled under linux: ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 9999 microseconds. Each test below will take on the order of 320000 microseconds. (= 32 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 86.4865 0.3770 0.3700 0.3800 Scale: 100.0000 0.3210 0.3200 0.3300 Add: 111.6279 0.4390 0.4300 0.4400 Triad: 109.0909 0.4450 0.4400 0.4500 ----------------------------------------------------------------------- From tyuan@beta.wsl.sinica.edu.tw Thu Sep 3 20:38:11 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via ESMTP id UAA05491 for ; Thu, 3 Sep 1998 20:38:11 -0700 Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id UAA68580 for ; Thu, 3 Sep 1998 20:38:10 -0700 (PDT) mail_from (tyuan@beta.wsl.sinica.edu.tw) Received: from ares.cs.Virginia.EDU (ares.cs.Virginia.EDU [128.143.137.19]) by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id UAA08698 for ; Thu, 3 Sep 1998 20:38:02 -0700 (PDT) mail_from (tyuan@beta.wsl.sinica.edu.tw) Received: from beta.wsl.sinica.edu.tw (beta.wsl.sinica.edu.tw [140.109.7.2]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with SMTP id XAA14058 for ; Thu, 3 Sep 1998 23:37:39 -0400 (EDT) Received: from chi.wsl.sinica.edu.tw by beta.wsl.sinica.edu.tw with SMTP (1.37.109.8/16.2) id AA09612; Fri, 4 Sep 1998 11:22:45 +0800 Message-Id: <35EF606E.CE4469ED@beta.wsl.sinica.edu.tw> Date: Fri, 04 Sep 1998 11:37:18 +0800 From: Tein Horng Yuan Organization: Academia Sinica Computing Centre X-Mailer: Mozilla 4.05 [en] (Win95; I) Mime-Version: 1.0 To: mccalpin@cs.virginia.edu Subject: STREAM benchmark result Content-Type: multipart/mixed; boundary="------------9A2A14AD434E0B8737C42DE0" Status: RO This is a multi-part message in MIME format. --------------9A2A14AD434E0B8737C42DE0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, Dr. McCalpin, I have done stream benchmark on many PC and Mac systems. Please add the results to your list. Thanks. -- Tein http://www.pcf.sinica.edu.tw/benchmark/system/stream/index.html --------------9A2A14AD434E0B8737C42DE0 Content-Type: text/html; charset=us-ascii; name="index.html" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="index.html" Content-Base: "http://www.pcf.sinica.edu.tw/benchmark /system/stream/index.html" STREAM benchmark result STREAM Benchmark Results

All the data is running at N=1000000 (Default value) in stream_d.c

Basically, you can only use stream to check the double precision throughput over the system bus.  As far as we know, the ideal and the best result of stream is determined by the system bus speed.  For example, all the 66MHz based Intel PII CPUs(233, 266, 300) are running at almost the same result.  Besides the system bus speed, compiler, floating point performance, RAM and even RAM module slot can make the result variant.

Currently, we are using 'stream' as our standard tool to calibrate our memory subsystem.  For further information, please check the overclock example.



ASUS P2B(100MHz), PII 400, Redhat 5.1, Kernel 2.0.34, 128MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
320.0000 320.0000 400.0000 400.0000


ASUS P2L97(66MHz), PII-233, Redhat 5.1, Kernel 2.1.106, 128MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
200.0000 200.0000 240.0000 240.0000

ASUS P2L97(66MHz), PPro 200 256KB L2 overclock 233, Redhat 5.1, Kernel 2.0.34
128MB SDRAM         egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
200.0000 228.5714 266.6667 240.0000

ASUS P2L97(66MHz), PPro 200 512KB L2 overclock 233, Redhat 5.1, Kernel 2.1.0.34
128MB SDRAM         egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
200.0000 228.5714 240.0000 240.0000


ACER 710TE Extensa Notebook(66MHz), PII-233, Redhat 5.1, Kernel 2.0.34
32MB SDRAM , egcs-1.0.2, gcc -O6 -funroll-loops
N is set to '500000'  in stream_d.c
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
200.0000 160.0000 171.4286 133.3333


IBM ThinkPad 600(66MHz), PII-233, Redhat 5.1, Kernel 2.0.34, 160MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
200.0000 200.0000 218.1818 218.1818


ASUS P2L97(66MHz), PII-266, Redhat 5.1, Kernel 2.0.34, 128MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops and  pgcc -tp p6 -O3 -Munroll
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
200.0000 228.5714 240.0000 240.0000


ASUS P2L97(66MHz), PII-300, Redhat 5.1, Kernel 2.0.34, 256MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops and pgcc -tp p6 -O3 -Munroll
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
200.0000 228.5714 240.0000 240.0000


GA-586TX3(66MHz), Pentium 233 MMX, Redhat 5.1, Kernel 2.0.34, 64MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
106.6667 145.4545 141.1765 141.1765

GA-586TX3(66MHz), AMD K6 233, Redhat 5.1, Kernel 2.0.34, 64MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
88.8889 84.2105 100.0000 96.0000


Iwill XA100(66MHz), Pentium 233 MMX, Redhat 5.1, Kernel 2.0.34, 128MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
114.2857 177.7778 184.6154 184.6154


Iwill XA100(66MHz), AMD K6-233, Redhat 5.1, Kernel 2.0.34, 128MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
100.0000 100.0000 96.0000 96.0000


Iwill XA100(100MHz), AMD K6-2 300, Redhat 5.1, Kernel 2.0.34, 128MB SDRAM
egcs-1.0.2, gcc -O6 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
123.0769 145.4545 133.3333 126.3158


HP SPP2000, HP-UX B.10..01 U 9000/889, 16xCPU, 4GB RAM
cc +O3dataprefetch
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
320.0003 399.9998 342.8575 342.8575


Mac G3 Desktop(66MHz), PPC750 266MHz, Mac OS 8.0, 64MB SDRAM
executable provided directly from STREAM site
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
150.8331 146.7856 153.4502 146.5425


Mac G3 Desktop(66MHz), PPC750 266MHz, Redhat PPC Feb_98, Kernel 2.1.24
64MB SDRAM,  gcc -O3 -funroll-loops
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
133.3335 133.3335 141.1766 141.1766


Mac G3 WallStreet Powerbook(83MHz), PPC750 292MHz, Mac OS 8.1, 64MB SDRAM
executable provided directly from STREAM site
 
Copy(MB/sec) Scale(MB/sec) Add(MB/sec) Triad(MB/sec)
155.7253 153.8314 147.2280 142.9784


Last updated on 08/10/98
For further information, please email pcfarm@sinica.edu.tw
  --------------9A2A14AD434E0B8737C42DE0-- From ce107@cfm.brown.edu Mon Oct 19 23:23:46 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (980427.SGI.8.8.8/960327.SGI.AUTOCF) via ESMTP id XAA76461 for ; Mon, 19 Oct 1998 23:23:45 -0700 (PDT) Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id XAA39957 for ; Mon, 19 Oct 1998 23:23:45 -0700 (PDT) mail_from (ce107@cfm.brown.edu) Received: from ares.cs.Virginia.EDU (mail.cs.Virginia.EDU [128.143.137.19]) by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id XAA07359 for ; Mon, 19 Oct 1998 23:23:43 -0700 (PDT) mail_from (ce107@cfm.brown.edu) X-UIDL: 908941894.046 From: ce107@cfm.brown.edu Received: from cfm.brown.edu (hydra.cfm.brown.edu [128.148.160.206]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with SMTP id CAA16764 for ; Tue, 20 Oct 1998 02:23:41 -0400 (EDT) Received: from urania (urania.cfm.brown.edu) by cfm.brown.edu (4.1/1.00) id AA18032; Tue, 20 Oct 98 02:25:00 EDT Received: by urania (SMI-8.6/SMI-SVR4) id CAA11019; Tue, 20 Oct 1998 02:23:40 -0400 Message-Id: <199810200623.CAA11019@urania> Subject: STREAM results for Ultra60/360 (2 cpus) To: mccalpin@cs.virginia.edu (John McCalpin) Date: Tue, 20 Oct 1998 02:23:40 -0400 (EDT) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Status: RO Using etime() and the Fortran source, version 4.2 of the Sun compilers (the 5.0 ones implement the prefetch instructions for Ultra2 so better performance is to be expected): For the serial code I compiled using: f77 -fast -xO4 -xdepend -fsimple=2 -xunroll=8 -xarch=v8plusa \ -o stream.shiva stream_d.f shiva:ce107/benchmark/stream% time stream.shiva ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 11 microseconds The tests below will each take a time on the order of 74281 microseconds (= 6753 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING: The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 355.1692 0.0906 0.0901 0.0919 Scaling : 343.8042 0.0934 0.0931 0.0937 Summing : 311.1300 0.1540 0.1543 0.1551 SAXPYing : 358.5744 0.1345 0.1339 0.1355 Sum of a is : 2.3066015625919D+18 Sum of b is : 4.6132031248564D+17 Sum of c is : 6.1509375001413D+17 Note: Nonstandard floating-point mode enabled See the Numerical Computation Guide, ieee_sun(3M) 5.25u 0.22s 0:05.53 98.9% For the autoparallelized code I compiled using: f77 -autopar -stackvar -fast -xO4 -xdepend -fsimple=2 -xunroll=8 \ -xarch=v8plusa -o stream.shiva.p stream_d.f I needed to increase the stack limit accordingly. shiva:ce107/benchmark/stream% setenv PARALLEL 1 shiva:ce107/benchmark/stream% time stream.shiva.p ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 91866 microseconds (= 91866 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING: The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 297.1950 0.1290 0.1077 0.1523 Scaling : 341.5131 0.1209 0.0937 0.1471 Summing : 290.5452 0.2051 0.1652 0.2360 SAXPYing : 271.9637 0.2047 0.1765 0.2272 Sum of a is : 2.3066015625919D+18 Sum of b is : 4.6132031248564D+17 Sum of c is : 6.1509375001413D+17 Note: Nonstandard floating-point mode enabled See the Numerical Computation Guide, ieee_sun(3M) 6.94u 0.22s 0:07.22 99.1% shiva:ce107/benchmark/stream% setenv PARALLEL 2 shiva:ce107/benchmark/stream% time stream.shiva.p ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 94030 microseconds (= 94030 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING: The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 340.7895 0.0952 0.0939 0.0962 Scaling : 392.5442 0.0822 0.0815 0.0830 Summing : 371.5010 0.1396 0.1292 0.1443 SAXPYing : 395.7602 0.1244 0.1213 0.1256 Sum of a is : 2.3066015625919D+18 Sum of b is : 4.6132031248564D+17 Sum of c is : 6.1509375001413D+17 Note: Nonstandard floating-point mode enabled See the Numerical Computation Guide, ieee_sun(3M) 9.73u 0.21s 0:05.20 191.1% Compiling without the -stackvar option: f77 -autopar -fast -xO4 -xdepend -fsimple=2 -xunroll=8 \ -xarch=v8plusa -o stream.shiva.pns stream_d.f shiva:ce107/benchmark/stream% setenv PARALLEL 1 shiva:ce107/benchmark/stream% time stream.shiva.pns ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 91920 microseconds (= 91920 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING: The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 343.8879 0.0944 0.0931 0.0957 Scaling : 287.9004 0.1116 0.1111 0.1123 Summing : 352.7692 0.1380 0.1361 0.1388 SAXPYing : 269.7542 0.1789 0.1779 0.1820 Sum of a is : 2.3066015625919D+18 Sum of b is : 4.6132031248564D+17 Sum of c is : 6.1509375001413D+17 Note: Nonstandard floating-point mode enabled See the Numerical Computation Guide, ieee_sun(3M) 5.76u 0.23s 0:06.06 98.8% shiva:ce107/benchmark/stream% setenv PARALLEL 2 shiva:ce107/benchmark/stream% time stream.shiva.pns ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 58727 microseconds (= 58727 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING: The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Assignment: 391.2734 0.0828 0.0818 0.0859 Scaling : 324.4294 0.0989 0.0986 0.0994 Summing : 428.6842 0.1123 0.1120 0.1130 SAXPYing : 374.8837 0.1410 0.1280 0.1433 Sum of a is : 2.3066015625919D+18 Sum of b is : 4.6132031248564D+17 Sum of c is : 6.1509375001413D+17 Note: Nonstandard floating-point mode enabled See the Numerical Computation Guide, ieee_sun(3M) 9.39u 0.51s 0:05.03 196.8% The combination of -autopar and -stackvar seem to influence some kernels more than others, in the case of summation with -autopar only even beneficially for only one processor when compared to the serial code. For the F90 version of the code using etime again() f90 -fast -xO4 -xarch=v8plusa -o stream.shiva.90 stream90.f90 shiva:ce107/benchmark/stream% time stream.shiva.90 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 11 microseconds The tests below will each take a time on the order of 95057 microseconds (= 8642 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) Min time Max time Mean time RMS time Median Copy: 333.37 0.0960 0.0997 0.0971 0.0971 0.0964 Scale: 287.27 0.1114 0.1134 0.1121 0.1121 0.1126 Add: 333.87 0.1438 0.1453 0.1445 0.1445 0.1440 Triad: 241.46 0.1988 0.2032 0.2001 0.2001 0.1997 ------------------------------------------------------------------------------- All times are 0.0997 0.1124 0.1452 0.2021 0.0968 0.1119 0.1443 0.2032 0.0967 0.1124 0.1441 0.1988 0.0965 0.1116 0.1441 0.1995 0.0960 0.1134 0.1438 0.1992 0.0969 0.1118 0.1443 0.2002 0.0964 0.1114 0.1441 0.1989 0.0980 0.1123 0.1453 0.1988 0.0963 0.1119 0.1448 0.2015 0.0975 0.1124 0.1453 0.1988 ------------------------------------------------------------------------------- Sum of a is = 115330078125. Sum of b is = 23066015625. Sum of c is = 30754687500. 6.20u 0.32s 0:06.58 99.0% Compiling with -parallel(/-stackvar) resulted in code that would not print anything apart from the first line of "---" :-). Constantinos Evangelinos Center for Fluid Mechanics Brown University/Division of Applied Mathematics From norbert.juffa@amd.com Tue Oct 27 11:40:14 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (980427.SGI.8.8.8/960327.SGI.AUTOCF) via ESMTP id LAA04654 for ; Tue, 27 Oct 1998 11:40:13 -0800 (PST) Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id LAA23467 for ; Tue, 27 Oct 1998 11:40:12 -0800 (PST) mail_from (norbert.juffa@amd.com) Received: from ares.cs.Virginia.EDU (ares.cs.Virginia.EDU [128.143.137.19]) by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id LAA08207 for ; Tue, 27 Oct 1998 11:40:10 -0800 (PST) mail_from (norbert.juffa@amd.com) Received: from amdext.amd.com (amdext.amd.com [139.95.251.1]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with ESMTP id OAA12174 for ; Tue, 27 Oct 1998 14:40:07 -0500 (EST) Received: from amdint.amd.com (amdint.amd.com [139.95.250.1]) by amdext.amd.com (8.8.5/8.8.5/AMD) with ESMTP id LAA20671 for ; Tue, 27 Oct 1998 11:39:36 -0800 (PST) Received: from cmdmail.amd.com (cmdmail.amd.com [172.28.14.226]) by amdint.amd.com (8.8.5/8.8.5/AMD) with ESMTP id LAA17971 for ; Tue, 27 Oct 1998 11:39:35 -0800 (PST) Received: from ultrasrvr28 (ultrasrvr28.amd.com [172.28.4.179]) by cmdmail.amd.com (8.9.1a-LCCHA/8.9.0/lccha 1.4) with SMTP id LAA04304 for ; Tue, 27 Oct 1998 11:39:33 -0800 (PST) Sender: norbert@amd.com Message-ID: <3636216A.B5D@amd.com> Date: Tue, 27 Oct 1998 11:39:22 -0800 From: Norbert Juffa Organization: Advanced Micro Devices, California Microprocessor Divisi X-Mailer: Mozilla 3.02 (X11; U; SunOS 5.5.1 sun4u) MIME-Version: 1.0 To: mccalpin@cs.virginia.edu Subject: STREAM results for Intel PentiumII @ 450 MHz Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Status: RO System configuration: Intel PentiumII processor @ 4.5x100MHz = 450 MHz, 512 KB L2, Asus P2B motherboard with Intel 440BX chipset, 256 MB PC-100 SDRAM (CAS-2), Windows NT 4 with SP3, compiler Intel ifl 2.4 P97241, switch -G6 -- Norbert Juffa ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 5566778 Offset = 0 The total memory requirement is 127 MB You are running each test 3 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 10000 microseconds The tests below will each take a time on the order of 240000 microseconds (= 24 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 306.0771 0.2910 0.2910 0.2910 Scale: 296.8948 0.3000 0.3000 0.3000 Add: 360.1150 0.3710 0.3710 0.3710 Triad: 351.5860 0.3800 0.3800 0.3800 Sum of a is = 37575751500.0000 Sum of b is = 7515150300.00000 Sum of c is = 10020200400.0000 From mccalpin@sgi.com Sun Jul 19 07:29:49 1998 Received: from 192.48.180.197 (ppp-frakir.engr.sgi.com [192.48.180.197]) by frakir.engr.sgi.com (950413.SGI.8.6.12/960327.SGI.AUTOCF) via SMTP id HAA19939; Sun, 19 Jul 1998 07:29:24 -0700 Message-ID: <35B2044F.6815@sgi.com> Date: Sun, 19 Jul 1998 07:35:53 -0700 From: John McCalpin Reply-To: mccalpin@sgi.com Organization: Silicon Graphics, Inc. X-Mailer: Mozilla 3.04Gold (Macintosh; U; 68K) MIME-Version: 1.0 To: mccalpin@sgi.com Subject: stream results Content-Type: multipart/mixed; boundary="------------68EC44617EC7" Status: RO This is a multi-part message in MIME format. --------------68EC44617EC7 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit http://www.pcf.sinica.edu.tw/benchmark/system/stream/index.html --------------68EC44617EC7 Content-Type: text/html; charset=us-ascii; name="index.html" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="index.html" Content-Base: "http://www.pcf.sinica.edu.tw/benchmark /system/stream/index.html" STREAM benchmark result STREAM benchmark result

Under construction!!!



ASUS-P2BX.PII-400.log  :

ASUS P2BX(100MHz), PII 400, Redhat 5.1, Kernel 2.0.34, 128MB SDRAM
egcs-1.0.2, gcc-O6 -funroll-loops

Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         320.0000       0.0511       0.0500       0.0600
Scale:        320.0000       0.0511       0.0500       0.0600
Add:          400.0000       0.0662       0.0600       0.0700
Triad:        400.0000       0.0652       0.0600       0.0700



ASUS-P2L97.PII-233.log  :

ASUS P2L97(66MHz), PII-233, Redhat 5.1, Kernel 2.1.106, 128MB SDRAM
egcs-1.0.2, gcc-O6 -funroll-loops

Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         200.0000       0.0881       0.0800       0.0900
Scale:        200.0000       0.0800       0.0800       0.0800
Add:          240.0000       0.1090       0.1000       0.1100
Triad:        240.0000       0.1081       0.1000       0.1100



ASUS-P2L97.PII-266.log  :

ASUS P2L97(66MHz), PII-266, Redhat 5.1, Kernel 2.0.34, 128MB SDRAM
egcs-1.0.2, gcc-O6 -funroll-loops

Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         200.0000       0.0881       0.0800       0.0900
Scale:        228.5714       0.0781       0.0700       0.0800
Add:          240.0000       0.1071       0.1000       0.1100
Triad:        240.0000       0.1021       0.1000       0.1100



ASUS-P2L97.PII-300.log  :

ASUS P2L97(66MHz), PII-300, Redhat 5.1, Kernel 2.0.34, 256MB SDRAM
egcs-1.0.2, gcc-O6 -funroll-loops

Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         200.0000       0.0881       0.0800       0.0900
Scale:        228.5714       0.0762       0.0700       0.0800
Add:          240.0000       0.1061       0.1000       0.1100
Triad:        240.0000       0.1021       0.1000       0.1100



MacG3.266.OS8_0.log  :

Mac G3(66MHz), PPC750 266MHz, Mac OS 8.0, 64MB SDRAM
executable provided directly from STREAM site

Function      Rate (MB/s)   RMS time     Min time    Max time
Copy:         150.8331       0.0431       0.0424       0.0446
Scale:        146.7856       0.0444       0.0436       0.0467
Add:          153.4502       0.0633       0.0626       0.0666
Triad:        146.5425       0.0662       0.0655       0.0697



MacG3.266.linux.log  :

Mac G3(66MHz), PPC750 266MHz, Redhat PPC Feb_98, Kernel 2.1.24, 64MB SDRAM
gcc-O3 -funroll-loops

Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:         133.3335       0.1221       0.1200       0.1300
Scale:        133.3335       0.1251       0.1200       0.1300
Add:          141.1766       0.1761       0.1700       0.1800
Triad:        141.1766       0.1791       0.1700       0.1900



MacG3.PB292.OS8_1.log  :

Mac G3 Powerbook(83MHz), PPC750 292MHz, Mac OS 8.1, 64MB SDRAM
executable provided directly from STREAM site

Function      Rate (MB/s)   RMS time     Min time    Max time
Copy:         155.7253       0.0415       0.0411       0.0422
Scale:        153.8314       0.0418       0.0416       0.0422
Add:          147.2280       0.0700       0.0652       0.1018
Triad:        142.9784       0.0674       0.0671       0.0676



Last updated on 07/09/98
For further information, please email pcfarm@sinica.edu.tw
  --------------68EC44617EC7-- From peter@localhost Sun Dec 27 16:13:53 PST 1998 Article: 83792 of comp.arch Path: news.corp.sgi.com!enews.sgi.com!news.idt.net!feed1.news.rcn.net!rcn!woodstock.news.demon.net!demon!news.demon.co.uk!demon!izabella.demon.co.uk!not-for-mail From: "P Dickerson" Newsgroups: comp.arch,comp.benchmarks Subject: Re: Stride Stream benchmark Date: Sun, 27 Dec 1998 13:28:43 -0000 Message-ID: <914766313.15762.0.nnrp-09.9e981e55@news.demon.co.uk> References: <763ac2$o36$1@news01bf.so-net.ne.jp> NNTP-Posting-Host: izabella.demon.co.uk X-NNTP-Posting-Host: izabella.demon.co.uk:158.152.30.85 X-Trace: news.demon.co.uk 914766313 nnrp-09:15762 NO-IDENT izabella.demon.co.uk:158.152.30.85 X-Complaints-To: abuse@demon.net X-Newsreader: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Lines: 100 Xref: news.corp.sgi.com comp.arch:83792 comp.benchmarks:28572 Status: RO Results for 225 MHz AMD K6 in TX MoBo and 75 MHz bus clock, 64MB SDRAM, NT4 SP3 #include needed to be commented out, source saved as tlb.c. Note pagesize for x86 is 4K, line size is 32 for Pentium and Pentium II class machines. D:\MYAPPS\MSVC\TLB> cl /Ox -DPAGESIZE=4096 -DCACHELINE=32 tlb.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8168 for 80x86 Copyright (C) Microsoft Corp 1984-1998. All rights reserved. tlb.c Microsoft (R) Incremental Linker Version 6.00.8168 Copyright (C) Microsoft Corp 1992-1998. All rights reserved. /out:tlb.exe tlb.obj D:\MYAPPS\MSVC\TLB> tlb STRIDE STREAM BENCHMARK. Stride size is 4136 bytes. Copyright 1998 Naohiko Shimizu Please report the result with your system. Transfer size(B) Throughput(MB/S) Access Time(nS) 136 612.000 13 264 586.667 14 520 581.366 14 1032 402.078 20 2056 71.610 112 -- P.M.Dickerson. email: peter (at) izabella (dot) demon (dot) co (dot) uk pshimizu@fa2.so-net.ne.jp wrote in message <763ac2$o36$1@news01bf.so-net.ne.jp>... >Hi, >This is a benchmark program to check the system's stride memory access >performance. Even in cache, most of RISC machines has poor performance >for a large stride data transfer, because of shorttage of the TLB >entries. With this program you can check the out of TLB performance. > >Enjoy, >Naohiko Shimizu. > >=========================================================================== == >/* >STRIDE STREAM BENCHMARK PROGRAM Last updated date : 98.12.26 > >Copyright 1998 by Naohiko Shimizu >All rights reserved. > >See GNU Public License 2.0 for the licensing term. > >Many system has poor performance on a large stride problem. This program >will test your system's IN CACHE stride access performance. The inner most >loop is designed to fit in almost all processors 1st cache. >If you have 16KB 1st cache and 32 bytes cache line size, upto 512 bytes >stride transfer will fit in your cache. (16*1024/32) >If you have more knowledge about your system, you can set CACHELINE and >PAGESIZE parameters for more accurate results. > >This program is designed with clock() function of C language, and if the >running time is too short for clock tick the result may not be correct. > >================= example result for AS200 4/100 =========================== >alpha% cc -O2 -o strdstrm -DCACHELINE=32 -DPAGESIZE=8192 strdstrm.c >alpha% ./strdstrm >STRIDE STREAM BENCHMARK. Stride size is 8232 bytes. > Copyright 1998 Naohiko Shimizu >Please report the result with your system. > >Transfer size(B) Throughput(MB/S) Access Time(nS) > 136 816.000 10 > 264 24.122 332 > 520 24.124 332 > 1032 24.125 332 > 2056 22.873 350 >=========================================================================== > > >Contact information: > >Dr. Naohiko Shimizu >School of Engr. Tokai Univ. >1117 Kitakaname Hiratsuka >Kanagawa 259-1292 Japan >email: nshimizu@et.u-tokai.ac.jp >URL: http://shimizu-lab.et.u-tokai.ac.jp > [ Source snipped] From rpeterson@msn.globaldialog.com Wed Dec 30 10:19:35 PST 1998 Article: 83870 of comp.arch Path: news.corp.sgi.com!news.sgi.com!howland.erols.net!panix!cam-news-hub1.bbnplanet.com!cam-news-feed2.bbnplanet.com!news.gtei.net!homer.alpha.net!msn.globaldialog.com!rpeterson From: rpeterson@msn.globaldialog.com (Ron Peterson) Subject: Re: Stride Stream benchmark Newsgroups: comp.arch,comp.benchmarks Followup-To: comp.arch,comp.benchmarks References: <763ac2$o36$1@news01bf.so-net.ne.jp> <914766313.15762.0.nnrp-09.9e981e55@news.demon.co.uk> X-Newsreader: TIN [version 1.2 PL2] Lines: 18 Message-ID: Date: Wed, 30 Dec 1998 05:14:56 GMT NNTP-Posting-Host: 156.46.122.12 X-Complaints-To: abuse@alpha.net X-Trace: homer.alpha.net 914994896 156.46.122.12 (Tue, 29 Dec 1998 23:14:56 CDT) NNTP-Posting-Date: Tue, 29 Dec 1998 23:14:56 CDT Xref: news.corp.sgi.com comp.arch:83870 comp.benchmarks:28592 Status: RO I ran the following on a NextStation 25 MHz 68040 computer using NextStep 3.3: STRIDE STREAM BENCHMARK. Stride size is 65800 bytes. Copyright 1998 Naohiko Shimizu Please report the result with your system. Transfer size(B) Throughput(MB/S) Access Time(nS) 136 14.836 539 264 14.185 564 520 13.277 603 1032 11.251 711 2056 10.780 742 Ron From John.Henning@digital.com Sun Nov 08 08:16:05 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (980427.SGI.8.8.8/960327.SGI.AUTOCF) via ESMTP id NAA06626 for ; Wed, 28 Oct 1998 13:19:37 -0800 (PST) Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id NAA66652 for ; Wed, 28 Oct 1998 13:19:37 -0800 (PST) mail_from (John.Henning@digital.com) Received: from mail11.digital.com (mail11.digital.com [192.208.46.10]) by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id NAA02605 for ; Wed, 28 Oct 1998 13:19:36 -0800 (PST) mail_from (John.Henning@digital.com) Received: from sbuamazko2ae.zko.dec.com (sbuamazko2ae.zko.dec.com [16.29.160.92]) by mail11.digital.com (8.8.8/8.8.8/WV1.0h) with ESMTP id QAA02884 for ; Wed, 28 Oct 1998 16:19:34 -0500 (EST) Received: by sbuamazko2ae.zko.dec.com with Internet Mail Service (5.0.1458.49) id ; Wed, 28 Oct 1998 16:19:32 -0500 Message-ID: <7653617F8828D11195580000F82202282BCEF0@ctgsvr.zko.dec.com> From: "John Henning (Nashua)" To: "'mccalpin@frakir.engr.sgi.com'" Date: Wed, 28 Oct 1998 16:19:29 -0500 X-Priority: 3 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.0.1458.49) Content-Type: multipart/mixed; boundary="---- =_NextPart_000_01BE028E.BEDC41F0" Status: RO Hi John, I hear you've mentioned me by name in a recent posting, though I haven't seen it. I trust you were kind. :-) Here's a Compaq AlphaServer GS140 6/525 with 16GB memory (4x4GB) 1x 525 MHz Alpha 21264 CPU Digital Unix V4.0E (Rev. 1083) DIGITAL Fortran 77 T5.2-169-4289P The source code for benchmark changes is attached, as well as the run log. I hope you are surviving your morning commute in the Bay Area! -john John L. Henning Consulting Software Engineer Benchmark Performance Engineering High Performance Servers Coimpaq Computer Corporation John.Henning@compaq.com Attachment converted: Macintosh HD:typescript.txt (TEXT/MSIE) (000 Script started on Wed Oct 28 16:15:59 1998 % unlimit % date Wed Oct 28 16:16:01 EST 1998 % diff stream_d.f_as_at_ftp_site_22may97 mine.f 53c53,54 < PARAMETER (n=2000000,offset=0,ndim=n+offset,ntimes=10) --- > PARAMETER (n = 1000003 , offset=0) > PARAMETER (ndim=n+offset,ntimes=10) 334a336,359 > END > > DOUBLE PRECISION FUNCTION SECOND > * > * Assume we will be called often enough so that rpcc counter > * doesn't overflow! > * > INTEGER*8 TICK, PREV_TICK > DOUBLE PRECISION RPCC_DELTA, PREV_SEC > INTEGER*8 FIXED_RPCC > EXTERNAL FIXED_RPCC, RPCC_DELTA > > DATA PREV_TICK /0/ > > TICK = FIXED_RPCC() > IF (PREV_TICK .EQ. 0) THEN > SECOND = 0D0 > ELSE > SECOND = PREV_SEC + RPCC_DELTA (%VAL(PREV_TICK), %VAL(TICK)) > END IF > > PREV_SEC = SECOND > PREV_TICK = TICK > RETURN % cc -c -o rpcc.o rpcc.c % f77 -non_shared -fast -O5 -unroll 8 -arch ev6 -assume nounderscore rpcc.o mine.f % ./a.out ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 1000003 Offset = 0 The total memory requirement is 22 MB You are running each test 10 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 18435 microseconds (= 18435 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 524.7155 0.0308 0.0305 0.0328 Scale: 700.0257 0.0231 0.0229 0.0242 Add: 614.9626 0.0391 0.0390 0.0393 Triad: 817.3842 0.0295 0.0294 0.0307 Sum of a is = 1.153304241182386E+018 Sum of b is = 2.306608482341258E+017 Sum of c is = 3.075477976467506E+017 % cat rpcc.c /* rpcc.c j.henning hacked from example by d.grunwald*/ #include #include #include #include #include #include static double scale = -1; static pid_t mypid = -1; /* fix_rpcc adds the two halves of the register (bias & offset), since some tools return it w/o this fixup */ unsigned long fix_rpcc(unsigned long rpcc) { unsigned long fixed; unsigned long offset = (rpcc >> 32) & 0xffffffff; unsigned long cycles = (rpcc & 0xffffffff); fixed = (unsigned long) offset + (unsigned long) cycles; /* printf ("%8lx %8lx %8lx ", offset, cycles, fixed); */ return fixed; } /* Compute time in *seconds* (note that Grunwald's original version did microseconds) given two fixedup RPCCs */ double rpcc_delta(unsigned long ul_start, unsigned long ul_stop) { double retval; if ( scale == -1 ) { int err; double freq; struct cpu_info cpubuf ; err = getsysinfo(GSI_CPU_INFO, &cpubuf, sizeof(cpubuf)); /*printf ("err= %d freq = %d current_cpu = %d\n", err, cpubuf.mhz, cpubuf.current_cpu);*/ if (err < 0) { freq = 300; } else { freq = cpubuf.mhz; } freq *= 1000000; /* * Scale is the 1 over the clock frequency */ scale = 1/(double) freq; } if (ul_stop < ul_start) { retval = ((0xffffffff - ul_start) + ul_stop) * scale; } else { retval = (ul_stop - ul_start) * scale; } /* printf ("%f\n", retval); */ return retval; } unsigned long fixed_rpcc(void) { unsigned long i; i = asm("rpcc %v0"); return fix_rpcc(i); } % date Wed Oct 28 16:16:54 EST 1998 % exit % script done on Wed Oct 28 16:16:55 1998 From bfriesen@simple.dallas.tx.us Sun Nov 29 14:11:32 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (980427.SGI.8.8.8/960327.SGI.AUTOCF) via ESMTP id OAA22803 for ; Sun, 29 Nov 1998 14:11:32 -0800 (PST) Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id OAA73227 for ; Sun, 29 Nov 1998 14:11:31 -0800 (PST) mail_from (bfriesen@simple.dallas.tx.us) Received: from ares.cs.Virginia.EDU (mail.cs.Virginia.EDU [128.143.137.19]) by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id OAA08702 for ; Sun, 29 Nov 1998 14:11:29 -0800 (PST) mail_from (bfriesen@simple.dallas.tx.us) Received: from mailhost.cyberramp.net (root@mailhost.cyberramp.net [207.158.64.11]) by ares.cs.Virginia.EDU (8.8.5/8.8.5) with ESMTP id RAA19802 for ; Sun, 29 Nov 1998 17:11:27 -0500 (EST) Received: from fuzzylog.simple.dallas.tx.us (dal-tsa14-23.cyberramp.net [207.158.83.215]) by mailhost.cyberramp.net (8.9.1a/8.9.1/ler-981120-1344) with ESMTP id QAA05012 for ; Sun, 29 Nov 1998 16:11:26 -0600 (CST) Received: from scooby (scooby [192.168.1.3]) by fuzzylog.simple.dallas.tx.us (8.8.8/8.8.8) with ESMTP id QAA03140 for ; Sun, 29 Nov 1998 16:11:23 -0600 (CST) Date: Sun, 29 Nov 1998 16:11:23 -0600 (CST) From: Bob Friesenhahn To: mccalpin@cs.virginia.edu Subject: STREAM results for Sun ULTRA 30 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO Results for a Sun Ultra 30 with a single 296MHz processor running Solaris 2.6. % uname -a SunOS scooby 5.6 Generic_105181-04 sun4u sparc SUNW,Ultra-30 % gcc -v Reading specs from /usr/local/lib/gcc-lib/sparc-sun-solaris2.6/egcs-2.91.57/specs gcc version egcs-2.91.57 19980901 (egcs-1.1 release) % gcc -O3 -mcpu=ultrasparc -o stream_d stream_d.c second_cpu.c -lm % ./stream_d ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 5000000, Offset = 0 Total memory required = 114.4 MB. Each test is run 20 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 9999 microseconds. Each test below will take on the order of 269999 microseconds. (= 27 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 210.5268 0.3850 0.3800 0.4000 Scale: 242.4243 0.3360 0.3300 0.3400 Add: 279.0708 0.4345 0.4300 0.4400 Triad: 210.5264 0.5785 0.5700 0.5900 ====================================== Bob Friesenhahn bfriesen@simple.dallas.tx.us http://www.cyberramp.net/~bfriesen From bfriesen@simple.dallas.tx.us Sun Nov 29 17:27:53 1998 Received: from cthulhu.engr.sgi.com (cthulhu.engr.sgi.com [192.26.80.2]) by frakir.engr.sgi.com (980427.SGI.8.8.8/960327.SGI.AUTOCF) via ESMTP id RAA22964 for ; Sun, 29 Nov 1998 17:27:53 -0800 (PST) Received: from sgi.sgi.com (sgi.engr.sgi.com [192.26.80.37]) by cthulhu.engr.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id RAA25691 for ; Sun, 29 Nov 1998 17:27:52 -0800 (PST) mail_from (bfriesen@simple.dallas.tx.us) Received: from mailhost.cyberramp.net (mailhost.cyberramp.net [207.158.64.11]) by sgi.sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id RAA01670 for ; Sun, 29 Nov 1998 17:27:51 -0800 (PST) mail_from (bfriesen@simple.dallas.tx.us) Received: from fuzzylog.simple.dallas.tx.us (dal-tsa3-24.cyberramp.net [207.158.87.24]) by mailhost.cyberramp.net (8.9.1a/8.9.1/ler-981120-1344) with ESMTP id TAA25128 for ; Sun, 29 Nov 1998 19:27:50 -0600 (CST) Received: from scooby (scooby [192.168.1.3]) by fuzzylog.simple.dallas.tx.us (8.8.8/8.8.8) with ESMTP id TAA03165 for ; Sun, 29 Nov 1998 19:27:48 -0600 (CST) Date: Sun, 29 Nov 1998 19:27:48 -0600 (CST) From: Bob Friesenhahn To: John McCalpin Subject: Re: STREAM results for Sun ULTRA 30 In-Reply-To: <199811300041.QAA22802@frakir.engr.sgi.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Status: RO On Sun, 29 Nov 1998, John McCalpin wrote: > Thanks for the results! I am a bit behind in updating the tables, > but your results are in my mailbox, and I will get them in the tables > as soon as I have time. Your tables are most interesting. They reveal why people still buy supercomputers. I don't know if this entry is of any value but it represents results from a Dell Pentium II box running at 233 MHz. The OS is FreeBSD 2.2.7 and the compiler is gcc 2.7.2 executed as: gcc -O3 -o stream_d stream_d.c second_cpu.c -lm # ./stream_d ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 3000000, Offset = 0 Total memory required = 68.7 MB. Each test is run 20 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 7812 microseconds. Each test below will take on the order of 312500 microseconds. (= 40 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 94.5231 0.5117 0.5078 0.5156 Scale: 94.5231 0.5141 0.5078 0.5156 Add: 103.5506 0.6957 0.6953 0.7031 Triad: 107.1628 0.6758 0.6719 0.6797 The results don't seem very impressive. Bob ====================================== Bob Friesenhahn bfriesen@simple.dallas.tx.us http://www.cyberramp.net/~bfriesen From peter@localhost Sun Dec 27 16:13:53 PST 1998 Article: 83792 of comp.arch Path: news.corp.sgi.com!enews.sgi.com!news.idt.net!feed1.news.rcn.net!rcn!woodstock.news.demon.net!demon!news.demon.co.uk!demon!izabella.demon.co.uk!not-for-mail From: "P Dickerson" Newsgroups: comp.arch,comp.benchmarks Subject: Re: Stride Stream benchmark Date: Sun, 27 Dec 1998 13:28:43 -0000 Message-ID: <914766313.15762.0.nnrp-09.9e981e55@news.demon.co.uk> References: <763ac2$o36$1@news01bf.so-net.ne.jp> NNTP-Posting-Host: izabella.demon.co.uk X-NNTP-Posting-Host: izabella.demon.co.uk:158.152.30.85 X-Trace: news.demon.co.uk 914766313 nnrp-09:15762 NO-IDENT izabella.demon.co.uk:158.152.30.85 X-Complaints-To: abuse@demon.net X-Newsreader: Microsoft Outlook Express 4.72.3110.5 X-MimeOLE: Produced By Microsoft MimeOLE V4.72.3110.3 Lines: 100 Xref: news.corp.sgi.com comp.arch:83792 comp.benchmarks:28572 Status: RO Results for 225 MHz AMD K6 in TX MoBo and 75 MHz bus clock, 64MB SDRAM, NT4 SP3 #include needed to be commented out, source saved as tlb.c. Note pagesize for x86 is 4K, line size is 32 for Pentium and Pentium II class machines. D:\MYAPPS\MSVC\TLB> cl /Ox -DPAGESIZE=4096 -DCACHELINE=32 tlb.c Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 12.00.8168 for 80x86 Copyright (C) Microsoft Corp 1984-1998. All rights reserved. tlb.c Microsoft (R) Incremental Linker Version 6.00.8168 Copyright (C) Microsoft Corp 1992-1998. All rights reserved. /out:tlb.exe tlb.obj D:\MYAPPS\MSVC\TLB> tlb STRIDE STREAM BENCHMARK. Stride size is 4136 bytes. Copyright 1998 Naohiko Shimizu Please report the result with your system. Transfer size(B) Throughput(MB/S) Access Time(nS) 136 612.000 13 264 586.667 14 520 581.366 14 1032 402.078 20 2056 71.610 112 -- P.M.Dickerson. email: peter (at) izabella (dot) demon (dot) co (dot) uk pshimizu@fa2.so-net.ne.jp wrote in message <763ac2$o36$1@news01bf.so-net.ne.jp>... >Hi, >This is a benchmark program to check the system's stride memory access >performance. Even in cache, most of RISC machines has poor performance >for a large stride data transfer, because of shorttage of the TLB >entries. With this program you can check the out of TLB performance. > >Enjoy, >Naohiko Shimizu. > >=========================================================================== == >/* >STRIDE STREAM BENCHMARK PROGRAM Last updated date : 98.12.26 > >Copyright 1998 by Naohiko Shimizu >All rights reserved. > >See GNU Public License 2.0 for the licensing term. > >Many system has poor performance on a large stride problem. This program >will test your system's IN CACHE stride access performance. The inner most >loop is designed to fit in almost all processors 1st cache. >If you have 16KB 1st cache and 32 bytes cache line size, upto 512 bytes >stride transfer will fit in your cache. (16*1024/32) >If you have more knowledge about your system, you can set CACHELINE and >PAGESIZE parameters for more accurate results. > >This program is designed with clock() function of C language, and if the >running time is too short for clock tick the result may not be correct. > >================= example result for AS200 4/100 =========================== >alpha% cc -O2 -o strdstrm -DCACHELINE=32 -DPAGESIZE=8192 strdstrm.c >alpha% ./strdstrm >STRIDE STREAM BENCHMARK. Stride size is 8232 bytes. > Copyright 1998 Naohiko Shimizu >Please report the result with your system. > >Transfer size(B) Throughput(MB/S) Access Time(nS) > 136 816.000 10 > 264 24.122 332 > 520 24.124 332 > 1032 24.125 332 > 2056 22.873 350 >=========================================================================== > > >Contact information: > >Dr. Naohiko Shimizu >School of Engr. Tokai Univ. >1117 Kitakaname Hiratsuka >Kanagawa 259-1292 Japan >email: nshimizu@et.u-tokai.ac.jp >URL: http://shimizu-lab.et.u-tokai.ac.jp > [ Source snipped] From rpeterson@msn.globaldialog.com Wed Dec 30 10:19:35 PST 1998 Article: 83870 of comp.arch Path: news.corp.sgi.com!news.sgi.com!howland.erols.net!panix!cam-news-hub1.bbnplanet.com!cam-news-feed2.bbnplanet.com!news.gtei.net!homer.alpha.net!msn.globaldialog.com!rpeterson From: rpeterson@msn.globaldialog.com (Ron Peterson) Subject: Re: Stride Stream benchmark Newsgroups: comp.arch,comp.benchmarks Followup-To: comp.arch,comp.benchmarks References: <763ac2$o36$1@news01bf.so-net.ne.jp> <914766313.15762.0.nnrp-09.9e981e55@news.demon.co.uk> X-Newsreader: TIN [version 1.2 PL2] Lines: 18 Message-ID: Date: Wed, 30 Dec 1998 05:14:56 GMT NNTP-Posting-Host: 156.46.122.12 X-Complaints-To: abuse@alpha.net X-Trace: homer.alpha.net 914994896 156.46.122.12 (Tue, 29 Dec 1998 23:14:56 CDT) NNTP-Posting-Date: Tue, 29 Dec 1998 23:14:56 CDT Xref: news.corp.sgi.com comp.arch:83870 comp.benchmarks:28592 Status: RO I ran the following on a NextStation 25 MHz 68040 computer using NextStep 3.3: STRIDE STREAM BENCHMARK. Stride size is 65800 bytes. Copyright 1998 Naohiko Shimizu Please report the result with your system. Transfer size(B) Throughput(MB/S) Access Time(nS) 136 14.836 539 264 14.185 564 520 13.277 603 1032 11.251 711 2056 10.780 742 Ron