From jonas@cs.cmu.edu Sun Jan 16 08:20:58 2005 X-Mozilla-Status: 0001 X-Mozilla-Status2: 00000000 Return-path: Received: from ms-mta-01 (ms-mta-01-smtp [10.93.38.11]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IAE004UBYKPTX@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Sun, 16 Jan 2005 08:22:01 -0600 (CST) Received: from austtx-mx-04.mgw.rr.com (austtx-mx-04.mgw.rr.com [24.93.40.211]) by ms-mta-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IAE00308YKO5G@ms-mta-01.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Sun, 16 Jan 2005 09:22:01 -0500 (EST) Received: from ares.cs.virginia.edu (128.143.137.19) by austtx-mx-04.mgw.rr.com with ESMTP; Sun, 16 Jan 2005 09:22:01 -0500 Received: from DOC.MRTC.RI.cmu.edu (DOC.MRTC.RI.CMU.EDU [128.2.178.192]) by ares.cs.Virginia.EDU (8.12.10/8.12.10/UVACS-2003031900) with ESMTP id j0GEKxXg002797 for ; Sun, 16 Jan 2005 09:21:59 -0500 (EST) Received: by DOC.MRTC.RI.cmu.edu (Postfix, from userid 9830) id EAE6E4CAEA; Sun, 16 Jan 2005 09:20:58 -0500 (EST) Date: Sun, 16 Jan 2005 09:20:58 -0500 From: Jonas August Subject: Power Mac Dual G5 2.5GHz single proc results To: mccalpin@cs.virginia.edu Message-id: <1105885258.8130.51.camel@DOC.MRTC.RI.cmu.edu> MIME-version: 1.0 X-Mailer: Ximian Evolution 1.4.6 Content-type: text/plain Content-transfer-encoding: 7bit Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Language: English X-NAS-Bayes: #0: 1.47668E-111; #1: 1 X-NAS-Classification: 0 X-NAS-MessageID: 1460 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R X-Status: NC X-KMail-EncryptionState: X-KMail-SignatureState: X-KMail-MDN-Sent: Hi, I just ran stream on a Dual 2.5GHz Power Mac G5 PPC 970FX with 512MB RAM and 512kB L2 cache; processor speed was set to "Automatic", but I didn't get improvement when I made it "Highest". The OS is Mac OS X 10.3.7 with gcc 3.3 (apple build 1640). For a single processor test I compiled with: gcc -fast -o stream_d second_wall.c stream_d.c which is supposed to optimize for G5. The result is: ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 12201 microseconds. (= 12201 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 1978.5036 0.0169 0.0162 0.0186 Scale: 1972.8907 0.0168 0.0162 0.0192 Add: 2208.1095 0.0222 0.0217 0.0229 Triad: 2215.4477 0.0222 0.0217 0.0229 I didn't do a dual proc. test because I don't have an openmp compiler. Thanks. -- Jonas August jonas*cs.cmu.edu (*=@) www.cs.cmu.edu/~jonas tel(412)268-1314 fax(412)268-6436 Robotics Institute, Carnegie Mellon University A409 Newell-Simon Hall, 5000 Forbes Avenue, Pittsburgh, PA 15213 From richard.horton@hp.com Tue Jan 18 14:17:11 2005 X-Mozilla-Status: 0001 X-Mozilla-Status2: 10000000 Return-path: Received: from ms-mta-01 (ms-mta-01-smtp [10.93.38.11]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IAJ00CEE4L4EX@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Tue, 18 Jan 2005 14:22:16 -0600 (CST) Received: from txmx03.mgw.rr.com (txmx03.mgw.rr.com [24.93.41.202]) by ms-mta-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IAJ0053D4KXDI@ms-mta-01.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Tue, 18 Jan 2005 15:22:16 -0500 (EST) Received: from hrndva-mx-01.mgw.rr.com (hrndva-mx-01.mgw.rr.com [24.28.204.20]) by txmx03.mgw.rr.com (8.12.10/8.12.8) with ESMTP id j0IK8DYG012798 for ; Tue, 18 Jan 2005 15:21:49 -0500 (EST) Received: from ares.cs.virginia.edu (128.143.137.19) by hrndva-mx-01.mgw.rr.com with ESMTP; Tue, 18 Jan 2005 15:17:36 -0500 Received: from ztxmail05.ztx.compaq.com (ztxmail05.ztx.compaq.com [161.114.1.209]) by ares.cs.Virginia.EDU (8.12.10/8.12.10/UVACS-2003031900) with ESMTP id j0IKHJR7013652 for ; Tue, 18 Jan 2005 15:17:35 -0500 (EST) Received: from cceexg12.americas.cpqcorp.net (cceexg12.americas.cpqcorp.net [16.81.1.38]) by ztxmail05.ztx.compaq.com (Postfix) with ESMTP id 5F992EFD8 for ; Tue, 18 Jan 2005 14:17:14 -0600 (CST) Received: from cceexc19.americas.cpqcorp.net ([16.81.1.19]) by cceexg12.americas.cpqcorp.net with Microsoft SMTPSVC(6.0.3790.211); Tue, 18 Jan 2005 14:17:13 -0600 Date: Tue, 18 Jan 2005 14:17:11 -0600 From: "Horton, Richard (Server Performance)" Subject: Emailing: BL30p.3200mhz2mb.2cpu2x1GB.RH3U3.SUBMITTAL.txt To: mccalpin@cs.virginia.edu Message-id: <520E883A6B222B41BDDF18CBAAF0847C07189F0F@cceexc19.americas.cpqcorp.net> MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-type: multipart/mixed; boundary="----_=_NextPart_001_01C4FD9A.B21CBFD2" Content-class: urn:content-classes:message Thread-topic: Emailing: BL30p.3200mhz2mb.2cpu2x1GB.RH3U3.SUBMITTAL.txt Thread-index: AcT9mrBrOqB1PXi0TR+lr+5Wwv+7oA== X-MS-Has-Attach: yes X-MS-TNEF-Correlator: X-Virus-Scanned: Symantec AntiVirus Scan Engine Original-recipient: rfc822;jmccalpin@austin.rr.com X-OriginalArrivalTime: 18 Jan 2005 20:17:13.0280 (UTC) FILETIME=[B2626400:01C4FD9A] X-NAS-Language: English X-NAS-Bayes: #0: 3.54363E-121; #1: 1 X-NAS-Classification: 0 X-NAS-MessageID: 1469 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R X-Status: NT X-KMail-EncryptionState: X-KMail-SignatureState: X-KMail-MDN-Sent: This is a multi-part message in MIME format. ------_=_NextPart_001_01C4FD9A.B21CBFD2 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable <> Please find attached for publication our STREAM results for the hp ProLiant BL30p. Thank you, Richard Horton Hewlett-Packard Server Performance Engineering ------_=_NextPart_001_01C4FD9A.B21CBFD2 Content-Type: text/plain; name="BL30p.3200mhz2mb.2cpu2x1GB.RH3U3.SUBMITTAL.txt" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="BL30p.3200mhz2mb.2cpu2x1GB.RH3U3.SUBMITTAL.txt" QmVsb3cgYXJlIHNpbmdsZSBjcHUgc3RyZWFtIHJlc3VsdHMgZm9yIGFuIEhQIFByb0xpYW50IEJM MzBwLCBjb25maWd1cmVkIGFzIGZvbGxvd3M6CgpIUCBQcm9MaWFudCBCTDMwcAoyeDMuMkdIei8y TUIgTDMgWGVvbiBwcm9jZXNzb3JzCjJHQiBtZW1vcnkgKDJ4MUdCIERJTU1zKQpSZWQgSGF0IDMg VXBkYXRlIDMKCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0KVGhpcyBzeXN0ZW0gdXNlcyA4IGJ5dGVzIHBlciBET1VCTEUgUFJFQ0lT SU9OIHdvcmQuCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0KQXJyYXkgc2l6ZSA9IDEwMDAwMDAsIE9mZnNldCA9IDIwNDgKVG90YWwg bWVtb3J5IHJlcXVpcmVkID0gMjIuOSBNQi4KRWFjaCB0ZXN0IGlzIHJ1biAxMDAgdGltZXMsIGJ1 dCBvbmx5CnRoZSAqYmVzdCogdGltZSBmb3IgZWFjaCBpcyB1c2VkLgotLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tCllvdXIgY2xvY2sg Z3JhbnVsYXJpdHkvcHJlY2lzaW9uIGFwcGVhcnMgdG8gYmUgMSBtaWNyb3NlY29uZHMuCkVhY2gg dGVzdCBiZWxvdyB3aWxsIHRha2Ugb24gdGhlIG9yZGVyIG9mIDU0MTkgbWljcm9zZWNvbmRzLgog ICAoPSA1NDE5IGNsb2NrIHRpY2tzKQpJbmNyZWFzZSB0aGUgc2l6ZSBvZiB0aGUgYXJyYXlzIGlm IHRoaXMgc2hvd3MgdGhhdAp5b3UgYXJlIG5vdCBnZXR0aW5nIGF0IGxlYXN0IDIwIGNsb2NrIHRp Y2tzIHBlciB0ZXN0LgotLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tCldBUk5JTkcgLS0gVGhlIGFib3ZlIGlzIG9ubHkgYSByb3VnaCBn dWlkZWxpbmUuCkZvciBiZXN0IHJlc3VsdHMsIHBsZWFzZSBiZSBzdXJlIHlvdSBrbm93IHRoZQpw cmVjaXNpb24gb2YgeW91ciBzeXN0ZW0gdGltZXIuCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0KRnVuY3Rpb24gICAgICBSYXRlIChN Qi9zKSAgIFJNUyB0aW1lICAgICBNaW4gdGltZSAgICAgTWF4IHRpbWUKQ29weTogICAgICAgIDI0 NjMuNDMzOCAgICAgICAwLjAwNjYgICAgICAgMC4wMDY1ICAgICAgIDAuMDA3NQpTY2FsZTogICAg ICAgMjQ3OC4zNTM4ICAgICAgIDAuMDA2NSAgICAgICAwLjAwNjUgICAgICAgMC4wMDY3CkFkZDog ICAgICAgICAyMTEyLjMzNDQgICAgICAgMC4wMTE1ICAgICAgIDAuMDExNCAgICAgICAwLjAxMTYK VHJpYWQ6ICAgICAgIDIwMTQuMjczMSAgICAgICAwLjAxMjEgICAgICAgMC4wMTE5ICAgICAgIDAu MDEyMwo= ------_=_NextPart_001_01C4FD9A.B21CBFD2-- From kball@pathscale.com Fri Feb 4 12:00:53 2005 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-01 (ms-mta-01-smtp [10.93.38.11]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IBE00IQPFFM48@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Fri, 04 Feb 2005 12:02:10 -0600 (CST) Received: from clmboh-mx-04.mgw.rr.com (clmboh-mx-04.mgw.rr.com [65.24.7.13]) by ms-mta-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IBE003N2FFLOJ@ms-mta-01.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Fri, 04 Feb 2005 13:02:10 -0500 (EST) Received: from ares.cs.virginia.edu (128.143.137.19) by clmboh-mx-04.mgw.rr.com with ESMTP; Fri, 04 Feb 2005 13:02:06 -0500 Received: from mx.pathscale.com (mx.pathscale.com [64.221.211.203]) by ares.cs.Virginia.EDU (8.13.3/8.12.10/UVACS-2003031900) with ESMTP id j14I1wHH003068 for ; Fri, 04 Feb 2005 13:01:59 -0500 (EST) Received: from [192.168.3.193] (ammonite.internal.keyresearch.com [192.168.3.193]) by mx.pathscale.com (Postfix) with ESMTP id 4B6C128AC2A; Fri, 04 Feb 2005 10:00:53 -0800 (PST) Date: Fri, 04 Feb 2005 10:00:53 -0800 From: Kevin Ball Subject: PathScale STREAM Submission To: John McCalpin Cc: Tom Elken Message-id: <1107540053.3257.1424.camel@ammonite> MIME-version: 1.0 X-Mailer: Ximian Evolution 1.4.6 (1.4.6-2) Content-type: text/plain Content-transfer-encoding: 7bit Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 1854 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R X-Status: NC X-KMail-EncryptionState: X-KMail-SignatureState: X-KMail-MDN-Sent: Hi John, I'm a benchmark engineer at PathScale, working under Tom Elken. We'd like to do a STREAM submission with the latest released compiler (EKOPath 2.0) on AMD Opteron. There are both serial results with one machine, and OpenMP results up to 4 threads with another. Details are below. Thanks much! -Kevin Ball System info: Compiler: PathScale EKO Compiler Suite, Release 2.0 Model Name: ASUS SK8N Motherboard, AMD Opteron (TM) Model 248 CPU: AMD Opteron 248 CPU MHz: 2200 FPU: Integrated CPU(s) enabled: 1 core, 1 chip, 1 core/chip Memory: 4x512MB, DDR400, PC3200, Corsair, CL2 Operating System: SuSE Linux 9.0 (AMD64) 2.4.21-102-default > pathcc -Ofast -LNO:prefetch_ahead=10 -lm stream.c mysecond.c -o path2.0 stream.c: mysecond.c: > ./path2.0 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 9317 microseconds. (= 9317 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 4811.3611 0.0060 0.0067 0.0067 Scale: 4782.3883 0.0060 0.0067 0.0068 Add: 4684.7375 0.0092 0.0102 0.0103 Triad: 4681.5783 0.0092 0.0103 0.0103 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- System info: Compiler: PathScale EKO Compiler Suite, Release 2.0 Model Name: 4-way, 2.2 GHz AMD Opteron (TM) Model 848 CPU: AMD Opteron 848 CPU MHz: 2200 FPU: Integrated CPU(s) enabled: 4 core, 4 chip, 1 core/chip Memory: 16x1024MB, DDR400 Operating System: Fedora Core 2; 2.6.8-1.521smp kernel > pathcc -Ofast -LNO:prefetch_ahead=10 -DUNDERSCORE -c mysecond.c > pathf90 -Ofast -LNO:prefetch_ahead=10 -DUNDERSCORE -mp -c stream.f > pathf90 -Ofast -LNO:prefetch_ahead=10 -DUNDERSCORE -mp mysecond.o stream.o -o path2.0mp > export OMP_NUM_THREADS=1 > ./path2.0mp ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 4000000 Offset = 0 The total memory requirement is 91 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations Number of Threads = 1 ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 4455.9519 0.0144 0.0144 0.0144 Scale: 4503.5728 0.0142 0.0142 0.0143 Add: 4326.2548 0.0222 0.0222 0.0222 Triad: 4401.4471 0.0218 0.0218 0.0219 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- > export OMP_NUM_THREADS=2 > ./path2.0mp ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 4000000 Offset = 0 The total memory requirement is 91 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations Number of Threads = 2 2 ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 8736.1427 0.0073 0.0073 0.0073 Scale: 8876.5403 0.0072 0.0072 0.0072 Add: 8521.9409 0.0113 0.0113 0.0113 Triad: 8660.5120 0.0111 0.0111 0.0111 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- > export OMP_NUM_THREADS=4 > ./path2.0mp ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 4000000 Offset = 0 The total memory requirement is 91 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations Number of Threads = 4 4 4 Number of Threads = 4 44 Number of Threads = 4 ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 15377.8332 0.0042 0.0042 0.0042 Scale: 15845.3135 0.0040 0.0040 0.0040 Add: 15617.6086 0.0062 0.0061 0.0062 Triad: 15920.8091 0.0061 0.0060 0.0061 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- From - Sat Feb 12 16:17:32 2005 X-Account-Key: account2 X-UIDL: 58213-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04400000 Return-path: Disposition-notification-to: dvianney@us.ibm.com Received: from ms-mta-01 (ms-mta-01-smtp [10.93.38.11]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IBR00F85HV28G@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Fri, 11 Feb 2005 13:23:26 -0600 (CST) Received: from hrndva-mx-04.mgw.rr.com (hrndva-mx-04.mgw.rr.com [24.28.204.23]) by ms-mta-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IBR000E5HUS5V@ms-mta-01.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Fri, 11 Feb 2005 14:23:26 -0500 (EST) Received: from ares.cs.virginia.edu (128.143.137.19) by hrndva-mx-04.mgw.rr.com with ESMTP; Fri, 11 Feb 2005 14:23:26 -0500 Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.143]) by ares.cs.Virginia.EDU (8.13.3/8.12.10/UVACS-2003031900) with ESMTP id j1BJNO2D017354 for ; Fri, 11 Feb 2005 14:23:24 -0500 (EST) Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e3.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j1BJNHK7013960 for ; Fri, 11 Feb 2005 14:23:17 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j1BJNHKK194456 for ; Fri, 11 Feb 2005 14:23:17 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j1BJNHiU009116 for ; Fri, 11 Feb 2005 14:23:17 -0500 Received: from d01ml072.pok.ibm.com (d01ml072.pok.ibm.com [9.56.228.147]) by d01av01.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j1BJNHnB009089; Fri, 11 Feb 2005 14:23:17 -0500 Date: Fri, 11 Feb 2005 13:23:15 -0600 From: Duc Vianney Subject: Standard 64-bit STREAM results on an IBM eServer OpenPower 710 To: john@mccalpin.com, mccalpin@cs.virginia.edu Cc: Andrea M Davis , Bill Clark , Ray Venditti Message-id: MIME-version: 1.0 X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Content-type: multipart/alternative; boundary="=_alternative 006A7EFE86256FA5_=" X-MIMETrack: Serialize by Router on D01ML072/01/M/IBM(Release 6.53IBM1 HF6|December 9, 2004) at 02/11/2005 14:23:17, Serialize complete at 02/11/2005 14:23:17 Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 2564 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R This is a multipart message in MIME format. --=_alternative 006A7EFE86256FA5_= Content-Type: text/plain; charset="US-ASCII" John: Please post the following results on late Monday 14 Feb 2005, but before 9 A.M. Tuesday 15 Feb 2005. These are standard 64-bit STREAM results on an IBM eServer OpenPower 710 with two 1650 MHz CPUs running RedHat Enterprise Linux AS 4. IBM Corporation IBM eServer OpenPower 710 (1650MHz, 2 CPU, Linux) CPU(s) enabled: 2 cores, 1 chip, 2 cores/chip (SMT ON) CPU(s) orderable: 1,2 Primary Cache: 64KBI+32KDB (on chip)/core Secondary Cache: 1920KB unified (on chip)/chip L3 Cache: 36MB unified (off chip)/DCM, 1 DCM/SUT SMT: Acronym for "Simultaneous Multi-Threading". A processor technology that allows the simultaneous execution of multiple thread contexts within a single processor core. (Enabled by default) DCM: Acronym for "Dual-Chip Module" (one dual-core processor chip + one L3-cache chip) SUT: Acronym for "System Under Test" Number of Threads = 4 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 66060288 Offset = 96 The total memory requirement is 1512 MB You are running each test 100 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 419093 microseconds (= 419093 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 2956.6088 .3591 .3575 .4547 Scale: 2928.6182 .3621 .3609 .4364 Add: 3918.6310 .4050 .4046 .4149 Triad: 3948.6218 .4021 .4015 .4149 Sum of a is = 0.537150969556339130E+126 Sum of b is = 0.107430193907022657E+126 Sum of c is = 0.143240258538696412E+126 locking to cpu 0 locking to cpu 1 locking to cpu 2 locking to cpu 3 Kindly regards .. Duc. Duc J Vianney, Ph. D., IBM Linux Technology Center Performance Team dvianney@us.ibm.com, Phone: (512) 838-9919 Fax: (512) 838-0070 --=_alternative 006A7EFE86256FA5_= Content-Type: text/html; charset="US-ASCII"
John:

Please post the following results on late Monday 14 Feb 2005, but before 9 A.M. Tuesday 15 Feb 2005.

These are standard 64-bit STREAM results on an IBM eServer OpenPower 710
with two 1650 MHz CPUs running RedHat Enterprise Linux AS 4.


IBM Corporation IBM eServer OpenPower 710 (1650MHz, 2 CPU, Linux)  
CPU(s) enabled: 2 cores, 1 chip, 2 cores/chip (SMT ON)
CPU(s) orderable: 1,2
Primary Cache: 64KBI+32KDB (on chip)/core
Secondary Cache: 1920KB unified (on chip)/chip
L3 Cache: 36MB unified (off chip)/DCM, 1 DCM/SUT


SMT: Acronym for "Simultaneous Multi-Threading". A processor technology that allows
    the simultaneous execution of multiple thread contexts within a single processor
    core. (Enabled by default)
DCM: Acronym for "Dual-Chip Module" (one dual-core processor chip + one L3-cache chip)
SUT: Acronym for "System Under Test"



 Number of Threads =  4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size =   66060288
Offset     =         96
The total memory requirement is 1512 MB
You are running each test 100 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be      1 microseconds
The tests below will each take a time on the order
of  419093  microseconds
   (=  419093  clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:       2956.6088       .3591       .3575       .4547

Scale:      2928.6182       .3621       .3609       .4364
Add:        3918.6310       .4050       .4046       .4149
Triad:      3948.6218       .4021       .4015       .4149
Sum of a is =  0.537150969556339130E+126
Sum of b is =  0.107430193907022657E+126
Sum of c is =  0.143240258538696412E+126
locking to cpu 0
locking to cpu 1
locking to cpu 2
locking to cpu 3



Kindly regards .. Duc.

Duc J Vianney, Ph. D., IBM Linux Technology Center Performance Team
dvianney@us.ibm.com,  Phone: (512) 838-9919   Fax: (512) 838-0070
--=_alternative 006A7EFE86256FA5_=-- From - Sat Feb 12 16:17:32 2005 X-Account-Key: account2 X-UIDL: 58216-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-02 (ms-mta-02-smtp [10.93.38.12]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IBR00FPBI048G@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Fri, 11 Feb 2005 13:26:28 -0600 (CST) Received: from austtx-mx-03.mgw.rr.com (austtx-mx-03.mgw.rr.com [24.93.40.214]) by ms-mta-02.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IBR00L3UHZUHA@ms-mta-02.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Fri, 11 Feb 2005 14:26:28 -0500 (EST) Received: from fife-smtp.catalog.com (HELO aux153.plano.net) (209.217.36.155) by austtx-mx-03.mgw.rr.com with SMTP; Fri, 11 Feb 2005 14:26:28 -0500 Received: (qmail 6748 invoked by uid 10000); Fri, 11 Feb 2005 19:27:21 +0000 Received: (qmail 6737 invoked from network); Fri, 11 Feb 2005 19:27:21 +0000 Received: from unknown (HELO barracuda.catalog.com) (209.217.52.101) by smtp3.catalog.com with SMTP; Fri, 11 Feb 2005 19:27:21 +0000 Received: from e6.ny.us.ibm.com (e6.ny.us.ibm.com [32.97.182.146]) by barracuda.catalog.com (Spam Firewall) with ESMTP id 84275D002C95 for ; Fri, 11 Feb 2005 13:26:25 -0600 (CST) Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e6.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id j1BJQO2M029540 for ; Fri, 11 Feb 2005 14:26:24 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j1BJQOKK221022 for ; Fri, 11 Feb 2005 14:26:24 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j1BJQOFf022627 for ; Fri, 11 Feb 2005 14:26:24 -0500 Received: from d01ml072.pok.ibm.com (d01ml072.pok.ibm.com [9.56.228.147]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id j1BJQNA6022588; Fri, 11 Feb 2005 14:26:24 -0500 Date: Fri, 11 Feb 2005 13:26:20 -0600 From: Duc Vianney Subject: Tuned 64-bit STREAM results on an IBM eServer OpenPower 710 To: john@mccalpin.com, mccalpin@cs.virginia.edu Cc: Andrea M Davis , Bill Clark , Ray Venditti Message-id: MIME-version: 1.0 X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Content-type: multipart/alternative; boundary="=_alternative 006AC77D86256FA5_=" X-MIMETrack: Serialize by Router on D01ML072/01/M/IBM(Release 6.53IBM1 HF6|December 9, 2004) at 02/11/2005 14:26:23, Serialize complete at 02/11/2005 14:26:23 Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 2566 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R This is a multipart message in MIME format. --=_alternative 006AC77D86256FA5_= Content-Type: text/plain; charset="US-ASCII" John: Please post the following results on late Monday 14 Feb 2005, but before 9 A.M. Tuesday 15 Feb 2005. These are tuned 64-bit STREAM results on an IBM eServer OpenPower 710 with two 1650 MHz CPUs running RedHat Enterprise Linux AS 4. IBM Corporation IBM eServer OpenPower 710 (1650MHz, 2 CPU, Linux) CPU(s) enabled: 2 cores, 1 chip, 2 cores/chip (SMT ON) CPU(s) orderable: 1,2 Primary Cache: 64KBI+32KDB (on chip)/core Secondary Cache: 1920KB unified (on chip)/chip L3 Cache: 36MB unified (off chip)/DCM, 1 DCM/SUT SMT: Acronym for "Simultaneous Multi-Threading". A processor technology that allows the simultaneous execution of multiple thread contexts within a single processor core. (Enabled by default) DCM: Acronym for "Dual-Chip Module" (one dual-core processor chip + one L3-cache chip) SUT: Acronym for "System Under Test" Number of Threads = 4 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 66060288 Offset = 96 The total memory requirement is 1512 MB You are running each test 100 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 373160 microseconds (= 373160 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3039.9033 .3507 .3477 .5598 Scale: 3010.0351 .3518 .3511 .3522 Add: 4378.1162 .3625 .3621 .3631 Triad: 4427.0507 .3585 .3581 .3645 Sum of a is = 0.537150969556339130E+126 Sum of b is = 0.107430193907022657E+126 Sum of c is = 0.143240258538696412E+126 locking to cpu 0 locking to cpu 1 locking to cpu 2 locking to cpu 3 Duc J Vianney, Ph. D., IBM Linux Technology Center Performance Team dvianney@us.ibm.com, Phone: (512) 838-9919 Fax: (512) 838-0070 --=_alternative 006AC77D86256FA5_= Content-Type: text/html; charset="US-ASCII"
John:

Please post the following results on late Monday 14 Feb 2005, but before 9 A.M. Tuesday 15 Feb 2005.

These are tuned 64-bit STREAM results on an IBM eServer OpenPower 710
with two 1650 MHz CPUs running RedHat Enterprise Linux AS 4.


IBM Corporation IBM eServer OpenPower 710 (1650MHz, 2 CPU, Linux)  
CPU(s) enabled: 2 cores, 1 chip, 2 cores/chip (SMT ON)
CPU(s) orderable: 1,2
Primary Cache: 64KBI+32KDB (on chip)/core
Secondary Cache: 1920KB unified (on chip)/chip
L3 Cache: 36MB unified (off chip)/DCM, 1 DCM/SUT


SMT: Acronym for "Simultaneous Multi-Threading". A processor technology that allows
    the simultaneous execution of multiple thread contexts within a single processor
    core. (Enabled by default)
DCM: Acronym for "Dual-Chip Module" (one dual-core processor chip + one L3-cache chip)
SUT: Acronym for "System Under Test"



 Number of Threads =  4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size =   66060288
Offset     =         96
The total memory requirement is 1512 MB
You are running each test 100 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity/precision appears to be      1 microseconds
The tests below will each take a time on the order
of  373160  microseconds
   (=  373160  clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:       3039.9033       .3507       .3477       .5598

Scale:      3010.0351       .3518       .3511       .3522
Add:        4378.1162       .3625       .3621       .3631
Triad:      4427.0507       .3585       .3581       .3645
Sum of a is =  0.537150969556339130E+126
Sum of b is =  0.107430193907022657E+126
Sum of c is =  0.143240258538696412E+126
locking to cpu 0
locking to cpu 1
locking to cpu 2
locking to cpu 3



Duc J Vianney, Ph. D., IBM Linux Technology Center Performance Team
dvianney@us.ibm.com,  Phone: (512) 838-9919   Fax: (512) 838-0070
--=_alternative 006AC77D86256FA5_=-- From - Mon Feb 14 19:03:49 2005 X-Account-Key: account2 X-UIDL: 58811-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IBX00FED6RACM@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 14 Feb 2005 15:09:17 -0600 (CST) Received: from clmboh-mx-01.mgw.rr.com (clmboh-mx-01.mgw.rr.com [65.24.7.10]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IBX001P36Q849@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 14 Feb 2005 15:09:13 -0600 (CST) Received: from fife-smtp.catalog.com (HELO aux153.plano.net) (209.217.36.155) by clmboh-mx-01.mgw.rr.com with SMTP; Mon, 14 Feb 2005 16:08:46 -0500 Received: (qmail 26690 invoked by uid 10000); Mon, 14 Feb 2005 21:09:28 +0000 Received: (qmail 26668 invoked from network); Mon, 14 Feb 2005 21:09:28 +0000 Received: from unknown (HELO barracuda.catalog.com) (209.217.52.101) by smtp3.catalog.com with SMTP; Mon, 14 Feb 2005 21:09:28 +0000 Received: from ztxmail03.ztx.compaq.com (ztxmail03.ztx.compaq.com [161.114.1.207]) by barracuda.catalog.com (Spam Firewall) with ESMTP id 660B4D00038E for ; Mon, 14 Feb 2005 15:08:32 -0600 (CST) Received: from cceexg12.americas.cpqcorp.net (cceexg12.americas.cpqcorp.net [16.81.1.38]) by ztxmail03.ztx.compaq.com (Postfix) with ESMTP id DAAA7A54 for ; Mon, 14 Feb 2005 15:08:27 -0600 (CST) Received: from cceexc19.americas.cpqcorp.net ([16.81.1.19]) by cceexg12.americas.cpqcorp.net with Microsoft SMTPSVC(6.0.3790.211); Mon, 14 Feb 2005 15:08:28 -0600 Date: Mon, 14 Feb 2005 15:08:26 -0600 From: "Schmidt, David (Performance Eng.)" Subject: STREAM Results for the HP ProLiants DL385, DL585, BL25p, BL35p, ML370 G4 To: john@mccalpin.com Message-id: MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-type: multipart/alternative; boundary="----_=_NextPart_001_01C512D9.53DA33A2" Content-class: urn:content-classes:message Delivered-to: mccalpin.com-john@mccalpin.com Thread-topic: STREAM Results for the HP ProLiants DL385, DL585, BL25p, BL35p, ML370 G4 Thread-index: AcUS2VNj74QMUGH5RAWz5x7Dsl0k4g== X-MS-Has-Attach: X-MS-TNEF-Correlator: X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-OriginalArrivalTime: 14 Feb 2005 21:08:28.0098 (UTC) FILETIME=[5445A220:01C512D9] X-NAS-Classification: 0 X-NAS-MessageID: 2668 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R This is a multi-part message in MIME format. ------_=_NextPart_001_01C512D9.53DA33A2 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable John, Below are STREAM results for the HP ProLiant DL385 (2 CPU), DL585 (4 CPU), BL25p (2 CPU), BL35p (2 CPU), and ML370 G4 (1 CPU). The configurations are described below with the results: HP ProLiant DL385 2x2.6GHz/1MB L2 252 Opteron processors 16GB PC3200 DDR memory (8x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) I used Revision 5.3 of the stream code and compiled with PGI C/C++ for Linux v.5.2-4: pgcc -O2 -Mvect=3Dsse -Mnontemporal -Munsafe_par_align -mp -o = ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 1000000, Offset =3D 49152 Total memory required =3D 22.9 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 2 Number of Threads requested =3D 2 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 4107 microseconds. (=3D 4107 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 7928.7410 0.0018 0.0020 0.0020 Scale: 7827.9323 0.0018 0.0020 0.0020 Add: 8244.3322 0.0026 0.0029 0.0030 Triad: 8247.0339 0.0026 0.0029 0.0029 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D HP ProLiant DL585 4x2.6GHz/1MB L2 852 Opteron processors 32GB PC2700 memory (16x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) I used Revision 5.3 of the stream code and compiled with PGI C/C++ for Linux v.5.2-4: pgcc -O2 -Mvect=3Dsse -Mnontemporal -Munsafe_par_align -mp -o = ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 2500000, Offset =3D 49152 Total memory required =3D 57.2 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 14740 microseconds. (=3D 14740 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 13893.0242 0.0026 0.0029 0.0029 Scale: 13894.1747 0.0026 0.0029 0.0029 Add: 14599.0393 0.0037 0.0041 0.0042 Triad: 14562.7128 0.0037 0.0041 0.0041 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D HP ProLiant BL25p 2x2.6GHz/1MB L2 Opteron 252 processors 16GB memory (8x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) - kernel 2.6.5-7.97-smp I used Revision 5.3 of the stream code and compiled with PGI C/C++ for Linux v.5.2-4: /usr/pgi/linux86-64/5.2/bin/pgcc -O2 -Mvect=3Dsse -Mnontemporal -Munsafe_par_align -mp -o ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 2500000, Offset =3D 512 Total memory required =3D 57.2 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 2 Number of Threads requested =3D 2 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 8744 microseconds. (=3D 8744 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 8435.0005 0.0043 0.0047 0.0048 Scale: 8407.1036 0.0043 0.0048 0.0048 Add: 8689.2563 0.0062 0.0069 0.0070 Triad: 8670.6946 0.0062 0.0069 0.0070 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D HP ProLiant BL35p 2x2.4GHz/1MB L2 250 Opteron processors 8GB PC3200 memory (4x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) I used Revision 5.3 of the stream code and compiled with PGI C/C++ for Linux v.5.2-4: pgcc -O2 -Mvect=3Dsse -Mnontemporal -mp -o ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 6500000, Offset =3D 16384 Total memory required =3D 148.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 2 Number of Threads requested =3D 2 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 83088 microseconds. (=3D 83088 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 8742.5116 0.0107 0.0119 0.0120 Scale: 8737.4332 0.0107 0.0119 0.0119 Add: 9092.4575 0.0155 0.0172 0.0172 Triad: 9062.3596 0.0155 0.0172 0.0172 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D HP ProLiant ML370 G4 1x3.6z/2MB L2 Xeon processors 4GB memory (8x512MB DIMMs) Windows Server 2003 Intel(R) C++ Compiler for 32-bit applications, Version 8.1 Build 20040802Z icl -Qopenmp -QxW -Qparallel -O3 -w stream_d_omp.c win_second_wall.c -o omp2 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 10000000, Offset =3D 2048 Total memory required =3D 228.9 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 43215 microseconds. (=3D 43215 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3965.6825 0.0419 0.0403 0.0535 Scale: 3925.5339 0.0409 0.0408 0.0413 Add: 3515.2573 0.0684 0.0683 0.0685 Triad: 3577.7362 0.0674 0.0671 0.0694 Thanks, David Schmidt Hewlett-Packard Company (281) 514-5039 D.Schmidt@hp.com ------_=_NextPart_001_01C512D9.53DA33A2 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable STREAM Results for the HP ProLiants DL385, DL585, BL25p, BL35p, = ML370 G4

John,

Below are STREAM results for  the = HP ProLiant DL385 (2 CPU), DL585 (4 CPU), BL25p (2 CPU), BL35p (2 CPU), = and ML370 G4 (1 CPU). The configurations are described below with the = results:

HP ProLiant DL385
2x2.6GHz/1MB L2  252 Opteron = processors
16GB PC3200 DDR memory (8x2GB = DIMMs)
SuSE Linux Enterprise Server 9 = (x86_64)

I used Revision 5.3 of the stream code = and compiled with PGI C/C++ for Linux v.5.2-4:

   pgcc -O2 -Mvect=3Dsse = -Mnontemporal -Munsafe_par_align -mp -o ompstream stream_omp.c

----------------------------------------------------------= ---
This system uses 8 bytes per DOUBLE = PRECISION word.
----------------------------------------------------------= ---
Array size =3D 1000000, Offset =3D = 49152
Total memory required =3D 22.9 = MB.
Each test is run 10 times, but = only
the *best* time for each is = used.
----------------------------------------------------------= ---
Number of Threads requested =3D = 2
Number of Threads requested =3D = 2
----------------------------------------------------------= ---
Your clock granularity/precision = appears to be 1 microseconds.
Each test below will take on the order = of 4107 microseconds.
   (=3D 4107 clock = ticks)
Increase the size of the arrays if = this shows that
you are not getting at least 20 clock = ticks per test.
----------------------------------------------------------= ---
WARNING -- The above is only a rough = guideline.
For best results, please be sure you = know the
precision of your system timer.
----------------------------------------------------------= ---
Function      = Rate (MB/s)   Avg time     Min = time     Max time
Copy:        = 7928.7410       = 0.0018       = 0.0020       0.0020
Scale:       = 7827.9323       = 0.0018       = 0.0020       0.0020
Add:         = 8244.3322       = 0.0026       = 0.0029       0.0030
Triad:       = 8247.0339       = 0.0026       = 0.0029       0.0029
----------------------------------------------------------= ---
Solution Validates
----------------------------------------------------------= ---

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

HP ProLiant DL585
4x2.6GHz/1MB L2  852 Opteron = processors
32GB PC2700 memory (16x2GB = DIMMs)
SuSE Linux Enterprise Server 9 = (x86_64)

I used Revision 5.3 of the stream code = and compiled with PGI C/C++ for Linux v.5.2-4:

   pgcc -O2 -Mvect=3Dsse = -Mnontemporal -Munsafe_par_align -mp -o ompstream stream_omp.c

----------------------------------------------------------= ---
This system uses 8 bytes per DOUBLE = PRECISION word.
----------------------------------------------------------= ---
Array size =3D 2500000, Offset =3D = 49152
Total memory required =3D 57.2 = MB.
Each test is run 10 times, but = only
the *best* time for each is = used.
----------------------------------------------------------= ---
Number of Threads requested =3D = 4
Number of Threads requested =3D = 4
Number of Threads requested =3D = 4
Number of Threads requested =3D = 4
----------------------------------------------------------= ---
Your clock granularity/precision = appears to be 1 microseconds.
Each test below will take on the order = of 14740 microseconds.
   (=3D 14740 clock = ticks)
Increase the size of the arrays if = this shows that
you are not getting at least 20 clock = ticks per test.
----------------------------------------------------------= ---
WARNING -- The above is only a rough = guideline.
For best results, please be sure you = know the
precision of your system timer.
----------------------------------------------------------= ---
Function      = Rate (MB/s)   Avg time     Min = time     Max time
Copy:       = 13893.0242       = 0.0026       = 0.0029       0.0029
Scale:      = 13894.1747       = 0.0026       = 0.0029       0.0029
Add:        = 14599.0393       = 0.0037       = 0.0041       0.0042
Triad:      = 14562.7128       = 0.0037       = 0.0041       0.0041
----------------------------------------------------------= ---
Solution Validates
----------------------------------------------------------= ---

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

HP ProLiant BL25p
2x2.6GHz/1MB L2 Opteron 252 = processors
16GB memory (8x2GB DIMMs)
SuSE Linux Enterprise Server 9 = (x86_64) - kernel 2.6.5-7.97-smp

I used Revision 5.3 of the stream code = and compiled with PGI C/C++ for Linux v.5.2-4:
/usr/pgi/linux86-64/5.2/bin/pgcc -O2 = -Mvect=3Dsse -Mnontemporal -Munsafe_par_align -mp -o ompstream = stream_omp.c

----------------------------------------------------------= ---
This system uses 8 bytes per DOUBLE = PRECISION word.
----------------------------------------------------------= ---
Array size =3D 2500000, Offset =3D = 512
Total memory required =3D 57.2 = MB.
Each test is run 10 times, but = only
the *best* time for each is = used.
----------------------------------------------------------= ---
Number of Threads requested =3D = 2
Number of Threads requested =3D = 2
----------------------------------------------------------= ---
Your clock granularity/precision = appears to be 1 microseconds.
Each test below will take on the order = of 8744 microseconds.
   (=3D 8744 clock = ticks)
Increase the size of the arrays if = this shows that
you are not getting at least 20 clock = ticks per test.
----------------------------------------------------------= ---
WARNING -- The above is only a rough = guideline.
For best results, please be sure you = know the
precision of your system timer.
----------------------------------------------------------= ---
Function      = Rate (MB/s)   Avg time     Min = time     Max time
Copy:        = 8435.0005       = 0.0043       = 0.0047       0.0048
Scale:       = 8407.1036       = 0.0043       = 0.0048       0.0048
Add:         = 8689.2563       = 0.0062       = 0.0069       0.0070
Triad:       = 8670.6946       = 0.0062       = 0.0069       0.0070
----------------------------------------------------------= ---
Solution Validates
----------------------------------------------------------= ---

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

HP ProLiant BL35p
2x2.4GHz/1MB L2  250 Opteron = processors
8GB PC3200 memory (4x2GB DIMMs)
SuSE Linux Enterprise Server 9 = (x86_64)

I used Revision 5.3 of the stream code = and compiled with PGI C/C++ for Linux v.5.2-4:
   pgcc -O2 -Mvect=3Dsse = -Mnontemporal -mp -o ompstream stream_omp.c

----------------------------------------------------------= ---
This system uses 8 bytes per DOUBLE = PRECISION word.
----------------------------------------------------------= ---
Array size =3D 6500000, Offset =3D = 16384
Total memory required =3D 148.8 = MB.
Each test is run 10 times, but = only
the *best* time for each is = used.
----------------------------------------------------------= ---
Number of Threads requested =3D = 2
Number of Threads requested =3D = 2
----------------------------------------------------------= ---
Your clock granularity/precision = appears to be 1 microseconds.
Each test below will take on the order = of 83088 microseconds.
   (=3D 83088 clock = ticks)
Increase the size of the arrays if = this shows that
you are not getting at least 20 clock = ticks per test.
----------------------------------------------------------= ---
WARNING -- The above is only a rough = guideline.
For best results, please be sure you = know the
precision of your system timer.
----------------------------------------------------------= ---
Function      = Rate (MB/s)   Avg time     Min = time     Max time
Copy:        = 8742.5116       = 0.0107       = 0.0119       0.0120
Scale:       = 8737.4332       = 0.0107       = 0.0119       0.0119
Add:         = 9092.4575       = 0.0155       = 0.0172       0.0172
Triad:       = 9062.3596       = 0.0155       = 0.0172       0.0172
----------------------------------------------------------= ---
Solution Validates
----------------------------------------------------------= ---

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

HP ProLiant ML370 G4
1x3.6z/2MB L2 Xeon processors
4GB memory (8x512MB DIMMs)
Windows Server 2003

Intel(R) C++ Compiler for 32-bit = applications, Version 8.1    Build 20040802Z
icl -Qopenmp -QxW -Qparallel -O3 -w = stream_d_omp.c win_second_wall.c -o omp2

----------------------------------------------------------= ---
This system uses 8 bytes per DOUBLE = PRECISION word.
----------------------------------------------------------= ---
Array size =3D 10000000, Offset =3D = 2048
Total memory required =3D 228.9 = MB.
Each test is run 10 times, but = only
the *best* time for each is = used.
----------------------------------------------------------= ---
Your clock granularity/precision = appears to be 1 microseconds.
Each test below will take on the order = of 43215 microseconds.
   (=3D 43215 clock = ticks)
Increase the size of the arrays if = this shows that
you are not getting at least 20 clock = ticks per test.
----------------------------------------------------------= ---
WARNING -- The above is only a rough = guideline.
For best results, please be sure you = know the
precision of your system timer.
----------------------------------------------------------= ---
Function      = Rate (MB/s)   RMS time     Min = time     Max time
Copy:        = 3965.6825       = 0.0419       = 0.0403       0.0535
Scale:       = 3925.5339       = 0.0409       = 0.0408       0.0413
Add:         = 3515.2573       = 0.0684       = 0.0683       0.0685
Triad:       = 3577.7362       = 0.0674       = 0.0671       0.0694

Thanks,

David = Schmidt
Hewlett-Packard = Company
(281) = 514-5039
D.Schmidt@hp.com

------_=_NextPart_001_01C512D9.53DA33A2-- From - Thu Feb 24 10:13:50 2005 Return-path: Date: Thu, 24 Feb 2005 10:13:25 -0600 From: John D Mccalpin Subject: Fw: standard STREAM on IBM eServer p5 575 (1900 MHz, 8cpu) To: mccalpin@cs.virginia.edu Message-id: X-Virus-Scanned: Symantec AntiVirus Scan Engine Original-recipient: rfc822;jmccalpin@austin.rr.com To: mccalpin@us.ibm.com cc: From: Ly Vu/Austin/IBM@IBMUS Subject: standard STREAM on IBM eServer p5 575 (1900 MHz, 8cpu) These are standard STREAM results on an IBM eServer p5 575 with eight 1900 MHz cpus. This is a POWER5 SMP machine. Large pages were used in all cases. Function Rate (MB/s) Avg time Min time Max time Copy: 34966.6146 .1232 .1228 .1235 Scale: 35035.0607 .1227 .1225 .1229 Add: 41076.3886 .1569 .1568 .1570 Triad: 41585.3956 .1553 .1548 .1556 Here is the full output file: -------------------------------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array =3D 8 CPU binding list : 0 2 4 6 8 10 12 14 Shared Segment Pointer =3D 504403158265495552 Shared Segment Pointer =3D 504403160412979200 Shared Segment Pointer =3D 504403162560462848 Segment Size (B) =3D 268435456 (MB =3D 256 ) Array Size (B) =3D 2147483648 (MB =3D 2048 ) Array Size (DW) =3D 268435456 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 rebind: num_parthds is 16 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset =3D 512 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size =3D 268303360 The total memory requirement is 6140 MB You are running each test 5 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds= ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 34966.6146 .1232 .1228 .1235 Scale: 35035.0607 .1227 .1225 .1229 Add: 41076.3886 .1569 .1568 .1570 Triad: 41585.3956 .1553 .1548 .1556 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. From - Thu Feb 24 10:14:49 2005 Return-path: Date: Thu, 24 Feb 2005 10:14:37 -0600 From: John D Mccalpin Subject: Fw: tuned STREAM on IBM eServer p5 575 (1900 MHz, 8cpu) To: mccalpin@cs.virginia.edu Message-id: X-Virus-Scanned: Symantec AntiVirus Scan Engine To: mccalpin@us.ibm.com cc: From: Ly Vu/Austin/IBM@IBMUS Subject: tuned STREAM on IBM eServer p5 575 (1900 MHz, 8cpu) These are tuned STREAM results on an IBM eServer p5 575 with eight 1900 MHz cpus. This is a POWER5 SMP machine. Large pages were used in all cases. Function Rate (MB/s) RMS time Min time Max time Copy: 52098.39 .08 .08 .09 Scale: 52173.72 .08 .08 .08 Add: 54863.67 .12 .12 .12 Triad: 55732.80 .12 .12 .12 Here is the full output file: -------------------------------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array =3D 8 CPU binding list : 0 2 4 6 8 10 12 14 Shared Segment Pointer =3D 504403158265495552 Shared Segment Pointer =3D 504403160412979200 Shared Segment Pointer =3D 504403162560462848 Segment Size (B) =3D 268435456 (MB =3D 256 ) Array Size (B) =3D 2147483648 (MB =3D 2048 ) Array Size (DW) =3D 268435456 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 Num_threads =3D 16 rebind: num_parthds is 16 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset =3D 512 Number of Threads =3D 16 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size =3D 268305408 Offset =3D 0 The total memory requirement is 6141 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 83800 microseconds (=3D 83800 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 52098.39 .08 .08 .09 Scale: 52173.72 .08 .08 .08 Add: 54863.67 .12 .12 .12 Triad: 55732.80 .12 .12 .12 Sum of a is =3D 407484806118750.000 Sum of b is =3D 81496961223750.0000 Sum of c is =3D 108662614965000.000 ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. From - Mon Mar 21 10:43:41 2005 X-Account-Key: account2 X-UIDL: 64756-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 14000000 Return-path: Received: from ms-mta-01 (ms-mta-01-smtp [10.93.38.11]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IDP007TGFLZ65@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 21 Mar 2005 07:46:52 -0600 (CST) Received: from flmx04.mgw.rr.com (flmx04.mgw.rr.com [65.32.1.49]) by ms-mta-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IDP00AJ3FM118@ms-mta-01.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 21 Mar 2005 08:46:49 -0500 (EST) Received: from austtx-mx-04.mgw.rr.com (austtx-mx-04.mgw.rr.com [24.93.40.211]) by flmx04.mgw.rr.com (8.12.10/8.12.8) with ESMTP id j2LDiqdP014787 for ; Mon, 21 Mar 2005 08:46:46 -0500 (EST) Received: from fife-smtp.catalog.com (HELO aux153.plano.net) (209.217.36.155) by austtx-mx-04.mgw.rr.com with SMTP; Mon, 21 Mar 2005 08:46:09 -0500 Received: (qmail 2806 invoked by uid 10000); Mon, 21 Mar 2005 13:47:17 +0000 Received: (qmail 2777 invoked from network); Mon, 21 Mar 2005 13:47:17 +0000 Received: from unknown (HELO barracuda.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Mon, 21 Mar 2005 13:47:17 +0000 Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.132]) by barracuda.catalog.com (Spam Firewall) with ESMTP id 21525D00E347 for ; Mon, 21 Mar 2005 07:46:07 -0600 (CST) Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e34.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j2LDk5KN503994 for ; Mon, 21 Mar 2005 08:46:05 -0500 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j2LDk5KH192014 for ; Mon, 21 Mar 2005 06:46:05 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j2LDk4qO020136 for ; Mon, 21 Mar 2005 06:46:05 -0700 Received: from d03nm690.boulder.ibm.com (d03nm690.boulder.ibm.com [9.17.195.59]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j2LDk4PH020130 for ; Mon, 21 Mar 2005 06:46:04 -0700 Date: Mon, 21 Mar 2005 07:45:56 -0600 From: John D Mccalpin Subject: Fw: standard STREAM on IBM eServer p5 550 (1500 MHz, 4cpu) To: john@mccalpin.com Message-id: MIME-version: 1.0 X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Content-type: multipart/related; Boundary="0__=09BBE558DFD81D2A8f9e8a93df938690918c09BBE558DFD81D2A" X-MIMETrack: Serialize by Router on D03NM690/03/M/IBM(Release 6.53HF294 | January 28, 2005) at 03/21/2005 06:47:37 Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: Symantec AntiVirus Scan Engine X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 4289 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R --0__=09BBE558DFD81D2A8f9e8a93df938690918c09BBE558DFD81D2A Content-type: multipart/alternative; Boundary="1__=09BBE558DFD81D2A8f9e8a93df938690918c09BBE558DFD81D2A" --1__=09BBE558DFD81D2A8f9e8a93df938690918c09BBE558DFD81D2A Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: quoted-printable --- John D. McCalpin, Ph.D. IBM - 11400 Burnet Road, MS 045-3N098 Austin, TX 78758 (512) 838-6167 or IBM tie line 678/6167 ----- Forwarded by John D Mccalpin/Austin/IBM on 03/21/2005 07:45 AM --= --- = Ly Vu/Austin/IBM = = 03/11/2005 01:37 = To PM mccalpin@us.ibm.com = = cc jacobt@us.ibm.com, = robichau@us.ibm.com = Subj= ect standard STREAM on IBM eServer p= 5 550 (1500 MHz, 4cpu) = = = = = = = These are standard STREAM results on an IBM eServer p5 550 express with four 1500 MHz cpus. This is a POWER5 SMP machine. Large pages were used in all cases. Function Rate (MB/s) Avg time Min time Max time Copy: 6176.3992 .1740 .1738 .1742 Scale: 6086.5589 .1764 .1763 .1765 Add: 7671.7113 .2099 .2098 .2099 Triad: 7814.9729 .2061 .2060 .2061 Here is the full output file: -------------------------------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array =3D 2 CPU binding list : 0 2 Shared Segment Pointer =3D 504403158265495552 Shared Segment Pointer =3D 504403158802366464 Shared Segment Pointer =3D 504403159339237376 Segment Size (B) =3D 268435456 (MB =3D 256 ) Array Size (B) =3D 536870912 (MB =3D 512 ) Array Size (DW) =3D 67108864 Num_threads =3D 4 Num_threads =3D 4 Num_threads =3D 4 Num_threads =3D 4 rebind: num_parthds is 4 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset =3D 512 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size =3D 67079168 The total memory requirement is 1535 MB You are running each test 5 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds= ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 6176.3992 .1740 .1738 .1742 Scale: 6086.5589 .1764 .1763 .1765 Add: 7671.7113 .2099 .2098 .2099 Triad: 7814.9729 .2061 .2060 .2061 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com= --1__=09BBE558DFD81D2A8f9e8a93df938690918c09BBE558DFD81D2A Content-type: text/html; charset=US-ASCII Content-Disposition: inline Content-transfer-encoding: quoted-printable


---
John D. McCalpin, Ph.D.
IBM - 11400 Burnet Road, MS 045-3N098
Austin, TX 78758
(512) 838-6167 or IBM tie line 678/6167

----- Forwarded by John D Mccalpin/A= ustin/IBM on 03/21/2005 07:45= AM -----

=
          Ly Vu/Austin/IBM

          03/11/2005 01:37 PM

=
3D""
To
3D""
mccalpin@us.ibm.com
3D""
cc
3D""
jacobt@us.ibm.com, robichau@us.ibm.com
3D""
Subject
3D""
standard STREAM on IBM eServer p5 550 (1500 MHz, 4cpu= )
=3D""3D""<= /td>


These are standard STREAM results on an IBM eServer p5 550 express
with four 1500 MHz cpus. This is a POWER5 SMP machine.
Large pages were used in all cases.

Function Rate (MB/s) Avg time Min time Max time
Copy: 6176.3992 .1740 .1738 .1742
Scale: 6086.5589 .1764 .1763 .1765
Add: 7671.7113 .2099 .2098 .2099
Triad: 7814.9729 .2061 .2060 .2061


Here is the full output file:
--------------------------------------------------

Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array =3D 2
CPU binding list : 0 2
Shared Segment Pointer =3D 504403158265495552
Shared Segment Pointer =3D 504403158802366464
Shared Segment Pointer =3D 504403159339237376
Segment Size (B) =3D 268435456 (MB =3D 256 )
Array Size (B) =3D 536870912 (MB =3D 512 )
Array Size (DW) =3D 67108864
Num_threads =3D 4
Num_threads =3D 4
Num_threads =3D 4
Num_threads =3D 4
rebind: num_parthds is 4
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000
Incremental Offset =3D 512
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size =3D 67079168
The total memory requirement is 1535 MB
You are running each test 5 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds=
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 6176.3992 .1740 .1738 .1742
Scale: 6086.5589 .1764 .1763 .1765
Add: 7671.7113 .2099 .2098 .2099
Triad: 7814.9729 .2061 .2060 .2061
----------------------------------------------------
Solution Validates!
----------------------------------------------------


______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com= --1__=09BBE558DFD81D2A8f9e8a93df938690918c09BBE558DFD81D2A-- --0__=09BBE558DFD81D2A8f9e8a93df938690918c09BBE558DFD81D2A Content-type: image/gif; name="pic29082.gif" Content-Disposition: inline; filename="pic29082.gif" Content-ID: <10__=09BBE558DFD81D2A8f9e8a93df938@us.ibm.com> Content-transfer-encoding: base64 R0lGODlhWwBGAMT/AAAAAFNQU6bK8JGvzn+Vq8HU53V6f/3+/pmamOfo5OGdE+XOpd21d7doA8KE T9/a19HHw8DAwLW1tf///wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAEA ABEALAAAAABbAEYAQAX/YCSOZGmeaKqubFs+EuM4SC0lj6vvfI8KgkIisVA0jscCcMlsOp/QqDSq E5CsI2yCoegiG45EBCsij0USQ+DKzrbL7DXPTH+P04+t98t4FOx1boJwEQESZjt1CGuBhI5nIg9F Rl8NDH+DkGVpEpqeiYChmY2eVg8PMg0QOQYGo3YtU7KztLWyc6InpIi5JqQlvD7Cw8TCeRISEw8B BggHYr4uvynBusXWeV2USAxiu71kCK2voxIIPd+CRHtfCxDjj+nxdgSuc7Sts+vbRw4QtlEMEICS CJ68Ur0icEEyo0+jAAdBkZs3EWFFATkoPpKo0WLHgyBhXRtJsqTJkygj/yw7YihjypfG/Dx7FwCA Spg4fSQo4CyBhAABLGH6mBAcNZE6uShw8CWMy5AXSQyooQZoq6utEAwQRqrAJH4NFjw1GiGNHKKv 7FWpJURbpUtPCEAECLAeEI4eSx1Qys/BKbSANYnjajSNATr7KoXNCDVeGgifqtgZoBZkHqWV/AbG Im6atBGLyH48NektNFKtIOd10YyuaycFYMiYsUAJkwADZvlonJe3FQgOBIrGFbX45uHVWPhGzrwi 3uXGew8f5jnnRuvYs2vfvgNCAH/cw4+A0CBoAAg4xG9/ACHZgUUGOqnXfgp9gglyDQxxSXg+izwE 7JQAUAAgAFl10v1QVP8mxJxyygRMHYHAWNElB911PSTggAIMmHZhgvMgI8EAyGCIzj4RIvEXiCwK YI4aauXVzG5lNJFYUwm85kQ9c+l4lmSjPHCAW3wwBg58LYYS42cT7QNWbYCI48qHdfzI5Gh6sAPG Xy6qkWSFMxIHWDZaCkXhcRWdR+NrXhGJRG06zkJZbjQaZ9gIWWa2SoVKnpPcCgdRZk8dl5XZxwNU GmKicqGEdlCbYC3GpwgCqRbZlYLFCNWNDB1YmAH/IKVCQHE+UQBmTdkmi0C6fZbGl2hJshBDC3jz xk8LsvDqpMcV4QADs0EpSAAIRMTCoM2hKUIBEMzmwAJZGJQro7zC6kk9e7/+JB+L0pTqbVzx0SWm tVQmO2651fL2nLnoBrZuuuxOt9a39Mb5rrKXdgSMqPmeexSD1yD4p38EF5xdCAA7 --0__=09BBE558DFD81D2A8f9e8a93df938690918c09BBE558DFD81D2A Content-type: image/gif; name="ecblank.gif" Content-Disposition: inline; filename="ecblank.gif" Content-ID: <20__=09BBE558DFD81D2A8f9e8a93df938@us.ibm.com> Content-transfer-encoding: base64 R0lGODlhEAABAIAAAAAAAP///yH5BAEAAAEALAAAAAAQAAEAAAIEjI8ZBQA7 --0__=09BBE558DFD81D2A8f9e8a93df938690918c09BBE558DFD81D2A-- From - Mon Mar 21 10:43:41 2005 X-Account-Key: account2 X-UIDL: 64755-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 14000000 Return-path: Received: from ms-mta-02 (ms-mta-02-smtp [10.93.38.12]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IDP007L0FLP65@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 21 Mar 2005 07:46:37 -0600 (CST) Received: from ohmx03.mgw.rr.com (ohmx03.mgw.rr.com [65.24.0.112]) by ms-mta-02.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IDP003Y0FLPE8@ms-mta-02.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 21 Mar 2005 08:46:37 -0500 (EST) Received: from clmboh-mx-02.mgw.rr.com (clmboh-mx-02.mgw.rr.com [65.24.7.11]) by ohmx03.mgw.rr.com (8.12.10/8.12.8) with ESMTP id j2LDkVYn015091 for ; Mon, 21 Mar 2005 08:46:34 -0500 (EST) Received: from fife-smtp.catalog.com (HELO aux153.plano.net) (209.217.36.155) by clmboh-mx-02.mgw.rr.com with SMTP; Mon, 21 Mar 2005 08:46:02 -0500 Received: (qmail 1649 invoked by uid 10000); Mon, 21 Mar 2005 13:47:11 +0000 Received: (qmail 1623 invoked from network); Mon, 21 Mar 2005 13:47:11 +0000 Received: from unknown (HELO barracuda.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Mon, 21 Mar 2005 13:47:11 +0000 Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by barracuda.catalog.com (Spam Firewall) with ESMTP id 5ED6DD00E347 for ; Mon, 21 Mar 2005 07:46:00 -0600 (CST) Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e32.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j2LDjs5j753312 for ; Mon, 21 Mar 2005 08:45:54 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VER6.6) with ESMTP id j2LDjsKH196568 for ; Mon, 21 Mar 2005 06:45:54 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j2LDjspk007433 for ; Mon, 21 Mar 2005 06:45:54 -0700 Received: from d03nm690.boulder.ibm.com (d03nm690.boulder.ibm.com [9.17.195.59]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j2LDjrQ5007425 for ; Mon, 21 Mar 2005 06:45:53 -0700 Date: Mon, 21 Mar 2005 07:45:46 -0600 From: John D Mccalpin Subject: Fw: Tuned STREAM on IBM eServer p5 550 (1500 MHz, 4cpu) To: john@mccalpin.com Message-id: MIME-version: 1.0 X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Content-type: multipart/related; Boundary="0__=09BBE558DFD810EE8f9e8a93df938690918c09BBE558DFD810EE" X-MIMETrack: Serialize by Router on D03NM690/03/M/IBM(Release 6.53HF294 | January 28, 2005) at 03/21/2005 06:47:26 Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: Symantec AntiVirus Scan Engine X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 4288 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=09BBE558DFD810EE8f9e8a93df938690918c09BBE558DFD810EE Content-type: multipart/alternative; Boundary="1__=09BBE558DFD810EE8f9e8a93df938690918c09BBE558DFD810EE" --1__=09BBE558DFD810EE8f9e8a93df938690918c09BBE558DFD810EE Content-type: text/plain; charset=US-ASCII Content-transfer-encoding: quoted-printable --- John D. McCalpin, Ph.D. IBM - 11400 Burnet Road, MS 045-3N098 Austin, TX 78758 (512) 838-6167 or IBM tie line 678/6167 ----- Forwarded by John D Mccalpin/Austin/IBM on 03/21/2005 07:45 AM --= --- = Ly Vu/Austin/IBM = = 03/11/2005 01:44 = To PM mccalpin@us.ibm.com = = cc jacobt@us.ibm.com, = robichau@us.ibm.com = Subj= ect Tuned STREAM on IBM eServer p5 5= 50 (1500 MHz, 4cpu) = = = = = = = These are tuned STREAM results on an IBM eServer p5 550 express with four 1500 MHz cpus. This is a POWER5 SMP machine. Large pages were used in all cases. Function Rate (MB/s) RMS time Min time Max time Copy: 6264.39 .17 .17 .19 Scale: 6137.41 .17 .17 .17 Add: 8775.14 .18 .18 .18 Triad: 8997.68 .18 .18 .18 Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array =3D 2 CPU binding list : 0 2 Shared Segment Pointer =3D 504403158265495552 Shared Segment Pointer =3D 504403158802366464 Shared Segment Pointer =3D 504403159339237376 Segment Size (B) =3D 268435456 (MB =3D 256 ) Array Size (B) =3D 536870912 (MB =3D 512 ) Array Size (DW) =3D 67108864 Num_threads =3D 4 Num_threads =3D 4 Num_threads =3D 4 Num_threads =3D 4 rebind: num_parthds is 4 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset =3D 2560 Number of Threads =3D 4 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size =3D 67077120 Offset =3D 0 The total memory requirement is 1535 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 174749 microseconds (=3D 174749 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 6264.39 .17 .17 .19 Scale: 6137.41 .17 .17 .17 Add: 8775.14 .18 .18 .18 Triad: 8997.68 .18 .18 .18 Sum of a is =3D 101873152743750.000 Sum of b is =3D 20374630548750.0000 Sum of c is =3D 27166174065000.0000 ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com= --1__=09BBE558DFD810EE8f9e8a93df938690918c09BBE558DFD810EE Content-type: text/html; charset=US-ASCII Content-Disposition: inline Content-transfer-encoding: quoted-printable


---
John D. McCalpin, Ph.D.
IBM - 11400 Burnet Road, MS 045-3N098
Austin, TX 78758
(512) 838-6167 or IBM tie line 678/6167

----- Forwarded by John D Mccalpin/A= ustin/IBM on 03/21/2005 07:45= AM -----

=
          Ly Vu/Austin/IBM

          03/11/2005 01:44 PM

=
3D""
To
3D""
mccalpin@us.ibm.com
3D""
cc
3D""
jacobt@us.ibm.com, robichau@us.ibm.com
3D""
Subject
3D""
Tuned STREAM on IBM eServer p5 550 (1500 MHz, 4cpu)
=3D""3D""<= /td>


These are tuned STREAM results on an IBM eServer p5 550 express
with four 1500 MHz cpus. This is a POWER5 SMP machine.
Large pages were used in all cases.

Function Rate (MB/s) RMS time Min time Max time
Copy: 6264.39 .17 .17 .19
Scale: 6137.41 .17 .17 .17
Add: 8775.14 .18 .18 .18
Triad: 8997.68 .18 .18 .18

Here is the full output file:
-----------------------------
Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array =3D 2
CPU binding list : 0 2
Shared Segment Pointer =3D 504403158265495552
Shared Segment Pointer =3D 504403158802366464
Shared Segment Pointer =3D 504403159339237376
Segment Size (B) =3D 268435456 (MB =3D 256 )
Array Size (B) =3D 536870912 (MB =3D 512 )
Array Size (DW) =3D 67108864
Num_threads =3D 4
Num_threads =3D 4
Num_threads =3D 4
Num_threads =3D 4
rebind: num_parthds is 4
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000
Incremental Offset =3D 2560
Number of Threads =3D 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size =3D 67077120
Offset =3D 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
= The tests below will each take a time on the order
of 174749 microseconds
(=3D 174749 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6264.39 .17 .17 .19
Scale: 6137.41 .17 .17 .17
Add: 8775.14 .18 .18 .18
Triad: 8997.68 .18 .18 .18
Sum of a is =3D 101873152743750.000
Sum of b is =3D 20374630548750.0000
Sum of c is =3D 27166174065000.0000



______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com= --1__=09BBE558DFD810EE8f9e8a93df938690918c09BBE558DFD810EE-- --0__=09BBE558DFD810EE8f9e8a93df938690918c09BBE558DFD810EE Content-type: image/gif; name="pic06918.gif" Content-Disposition: inline; filename="pic06918.gif" Content-ID: <10__=09BBE558DFD810EE8f9e8a93df938@us.ibm.com> Content-transfer-encoding: base64 R0lGODlhWwBGAMT/AAAAAFNQU6bK8JGvzn+Vq8HU53V6f/3+/pmamOfo5OGdE+XOpd21d7doA8KE T9/a19HHw8DAwLW1tf///wAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH5BAEA ABEALAAAAABbAEYAQAX/YCSOZGmeaKqubFs+EuM4SC0lj6vvfI8KgkIisVA0jscCcMlsOp/QqDSq E5CsI2yCoegiG45EBCsij0USQ+DKzrbL7DXPTH+P04+t98t4FOx1boJwEQESZjt1CGuBhI5nIg9F Rl8NDH+DkGVpEpqeiYChmY2eVg8PMg0QOQYGo3YtU7KztLWyc6InpIi5JqQlvD7Cw8TCeRISEw8B BggHYr4uvynBusXWeV2USAxiu71kCK2voxIIPd+CRHtfCxDjj+nxdgSuc7Sts+vbRw4QtlEMEICS CJ68Ur0icEEyo0+jAAdBkZs3EWFFATkoPpKo0WLHgyBhXRtJsqTJkygj/yw7YihjypfG/Dx7FwCA Spg4fSQo4CyBhAABLGH6mBAcNZE6uShw8CWMy5AXSQyooQZoq6utEAwQRqrAJH4NFjw1GiGNHKKv 7FWpJURbpUtPCEAECLAeEI4eSx1Qys/BKbSANYnjajSNATr7KoXNCDVeGgifqtgZoBZkHqWV/AbG Im6atBGLyH48NektNFKtIOd10YyuaycFYMiYsUAJkwADZvlonJe3FQgOBIrGFbX45uHVWPhGzrwi 3uXGew8f5jnnRuvYs2vfvgNCAH/cw4+A0CBoAAg4xG9/ACHZgUUGOqnXfgp9gglyDQxxSXg+izwE 7JQAUAAgAFl10v1QVP8mxJxyygRMHYHAWNElB911PSTggAIMmHZhgvMgI8EAyGCIzj4RIvEXiCwK YI4aauXVzG5lNJFYUwm85kQ9c+l4lmSjPHCAW3wwBg58LYYS42cT7QNWbYCI48qHdfzI5Gh6sAPG Xy6qkWSFMxIHWDZaCkXhcRWdR+NrXhGJRG06zkJZbjQaZ9gIWWa2SoVKnpPcCgdRZk8dl5XZxwNU GmKicqGEdlCbYC3GpwgCqRbZlYLFCNWNDB1YmAH/IKVCQHE+UQBmTdkmi0C6fZbGl2hJshBDC3jz xk8LsvDqpMcV4QADs0EpSAAIRMTCoM2hKUIBEMzmwAJZGJQro7zC6kk9e7/+JB+L0pTqbVzx0SWm tVQmO2651fL2nLnoBrZuuuxOt9a39Mb5rrKXdgSMqPmeexSD1yD4p38EF5xdCAA7 --0__=09BBE558DFD810EE8f9e8a93df938690918c09BBE558DFD810EE Content-type: image/gif; name="ecblank.gif" Content-Disposition: inline; filename="ecblank.gif" Content-ID: <20__=09BBE558DFD810EE8f9e8a93df938@us.ibm.com> Content-transfer-encoding: base64 R0lGODlhEAABAIAAAAAAAP///yH5BAEAAAEALAAAAAAQAAEAAAIEjI8ZBQA7 --0__=09BBE558DFD810EE8f9e8a93df938690918c09BBE558DFD810EE-- From - Tue Apr 12 06:15:35 2005 X-Account-Key: account2 X-UIDL: 68048-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-02 (ms-mta-02-smtp [10.93.38.12]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IES00K13UT651@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 11 Apr 2005 15:41:30 -0500 (CDT) Received: from hrndva-mx-04.mgw.rr.com (hrndva-mx-04.mgw.rr.com [24.28.204.23]) by ms-mta-02.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 1.21 (built Sep 8 2003)) with ESMTP id <0IES00FRMUT1J1@ms-mta-02.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 11 Apr 2005 16:41:30 -0400 (EDT) Received: from ares.cs.virginia.edu (128.143.137.19) by hrndva-mx-04.mgw.rr.com with ESMTP; Mon, 11 Apr 2005 16:41:30 -0400 Received: from hotmail.com (bay0-dav-040.bay0.hotmail.com [64.4.60.62]) by ares.cs.Virginia.EDU (8.13.3/8.12.10/UVACS-2003031900) with ESMTP id j3BKegEI017998 for ; Mon, 11 Apr 2005 16:40:42 -0400 (EDT) Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Mon, 11 Apr 2005 13:40:34 -0700 Received: from 207.68.174.59 by BAY0-DAV-040.phx.gbl with DAV; Mon, 11 Apr 2005 20:40:34 +0000 Date: Mon, 11 Apr 2005 13:40:34 -0700 From: Juan Uys Subject: Result of Stream by John D. McCalpin X-Originating-IP: [207.68.174.59] X-Sender: juanmuys@hotmail.com To: mccalpin@cs.virginia.edu Cc: Message-id: MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Content-type: text/plain; charset=iso-8859-1 Content-transfer-encoding: 7bit Content-class: urn:content-classes:message Importance: normal Priority: normal Thread-topic: Result of Stream by John D. McCalpin Thread-index: AcU+1rWgbHglJQXRSJylmgfKbKTUCg== X-Originating-Email: [juanmuys@hotmail.com] Original-recipient: rfc822;jmccalpin@austin.rr.com X-OriginalArrivalTime: 11 Apr 2005 20:40:34.0469 (UTC) FILETIME=[B5D81D50:01C53ED6] X-NAS-Classification: 0 X-NAS-MessageID: 6831 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} AMD64 2.4GHz (2.6.9-1.667 #1 Tue Nov 2 14:50:10 EST 2004 x86_64 x86_64 x86_64 GNU/Linux) Motherboard: K8V Deluxe RAM: 2x1GB Kingston HyperX 3200 K2/2G 2.5CL with ECC Compiler: Thread model: posix gcc version 3.4.2 20041017 (Red Hat 3.4.2-6.fc3) Flags: -O I'm booted into Fedora Core 3 running Gnome and a couple of apps. Is there anything I can do to get better results? Thanks! :) Result: ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 44392 microseconds. (= 44392 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 1908.3994 0.0169 0.0168 0.0173 Scale: 1902.7180 0.0169 0.0168 0.0172 Add: 2123.2503 0.0227 0.0226 0.0231 Triad: 2126.2327 0.0227 0.0226 0.0234 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- From - Sat May 14 19:09:40 2005 X-Account-Key: account2 X-UIDL: 72531-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-02 (ms-mta-02-smtp [10.93.38.12]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IG9001TIDJLIT@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Tue, 10 May 2005 00:21:22 -0500 (CDT) Received: from orngca-mx-01.mgw.rr.com (orngca-mx-01.mgw.rr.com [66.75.160.128]) by ms-mta-02.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IG900BF0DJLCH@ms-mta-02.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Tue, 10 May 2005 01:21:21 -0400 (EDT) Received: from fife-smtp.catalog.com (HELO aux153.plano.net) (209.217.36.155) by orngca-mx-01.mgw.rr.com with SMTP; Tue, 10 May 2005 01:21:22 -0400 Received: (qmail 9935 invoked by uid 10000); Tue, 10 May 2005 05:22:50 +0000 Received: (qmail 9858 invoked from network); Tue, 10 May 2005 05:22:50 +0000 Received: from unknown (HELO barracuda.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Tue, 10 May 2005 05:22:50 +0000 Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by barracuda.catalog.com (Spam Firewall) with ESMTP id 4607ED01F0E9 for ; Mon, 09 May 2005 23:48:44 -0500 (CDT) Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.13.1/8.12.9) with SMTP id j4A4mdd5060975 for ; Mon, 09 May 2005 21:48:39 -0700 Received: from [148.63.106.178] (via account 3968) by mx-out.daemonmail.net with ESMTP id QZY2s2i5 authenticated by SMTP; Mon, 09 May 2005 21:48:34 -0700 (PDT) Date: Mon, 09 May 2005 22:48:27 -0600 From: Paul Saxe Subject: STREAM results on a dual Opteron 246 To: john@mccalpin.com Message-id: <42803D1B.2070606@MaterialsDesign.com> MIME-version: 1.0 Content-type: text/html; charset=ISO-8859-1 Content-transfer-encoding: 7bit X-Accept-Language: en-us, en Delivered-to: mccalpin.com-john@mccalpin.com User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 8519 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R Dear Dr. McCalpin,

I ran the 01-stream_d_c_omp_x86_64 executable from your 2004-11-17 "New Opteron Binary" note on a dual processor Opteron 246 (2 GHz) machine with 2 GB PC3200 memory, a Tyan 2882 motherboard and running Fedora 2 for the AMD64. Perhaps these results are of use to you -- they seem to be considerably different than similar ones on the web site.

If you need any other information or would like me to do anything else, please let me know.

Paul Saxe.


[psaxe@opteron1 psaxe]$ export OMP_NUM_THREADS=1
[psaxe@opteron1 psaxe]$ ./01*
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 507503 microseconds.
   (= 507503 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:        4163.9075       0.0769       0.0769       0.0772
Scale:       4326.7150       0.0740       0.0740       0.0742
Add:         4271.1491       0.1125       0.1124       0.1126
Triad:       4339.7302       0.1107       0.1106       0.1108


[psaxe@opteron1 psaxe]$ export OMP_NUM_THREADS=2
[psaxe@opteron1 psaxe]$ ./01*
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 20000000, Offset = 0
Total memory required = 457.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 256726 microseconds.
   (= 256726 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function      Rate (MB/s)   RMS time     Min time     Max time
Copy:        8052.1296       0.0398       0.0397       0.0402
Scale:       8491.1922       0.0378       0.0377       0.0381
Add:         8369.8108       0.0574       0.0573       0.0578
Triad:       8431.9977       0.0570       0.0569       0.0570

From - Thu Jun 9 05:24:56 2005 X-Account-Key: account2 X-UIDL: 76641-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IHO008KGGP60N@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 06 Jun 2005 14:27:07 -0500 (CDT) Received: from hrndva-mx-04.mgw.rr.com (hrndva-mx-04.mgw.rr.com [24.28.204.23]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IHO00M6MGNX4J@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 06 Jun 2005 14:27:07 -0500 (CDT) Received: from fife-smtp.catalog.com (HELO aux153.plano.net) (209.217.36.155) by hrndva-mx-04.mgw.rr.com with SMTP; Mon, 06 Jun 2005 15:27:02 -0400 Received: (qmail 5095 invoked by uid 10000); Mon, 06 Jun 2005 19:28:36 +0000 Received: (qmail 5000 invoked from network); Mon, 06 Jun 2005 19:28:36 +0000 Received: from unknown (HELO barracuda.catalog.com) (209.217.52.101) by smtp3.catalog.com with SMTP; Mon, 06 Jun 2005 19:28:36 +0000 Received: from ccerelbas02.cce.hp.com (ccerelbas02.cce.hp.com [161.114.21.105]) by barracuda.catalog.com (Spam Firewall) with ESMTP id 53943D00093B for ; Mon, 06 Jun 2005 14:26:50 -0500 (CDT) Received: from cceexg11.americas.cpqcorp.net (cceexg11.americas.cpqcorp.net [16.81.1.59]) by ccerelbas02.cce.hp.com (Postfix) with ESMTP id C041520001C7 for ; Mon, 06 Jun 2005 14:26:43 -0500 (CDT) Received: from cceexc19.americas.cpqcorp.net ([16.81.1.60]) by cceexg11.americas.cpqcorp.net with Microsoft SMTPSVC(6.0.3790.211); Mon, 06 Jun 2005 14:26:39 -0500 Date: Mon, 06 Jun 2005 14:26:37 -0500 From: "Schmidt, David (Performance Eng.)" Subject: STREAM Results for the HP ProLiant DL585 and HP ProLiant BL45p with 852 Opteron CPUs To: john@mccalpin.com Message-id: MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-type: multipart/alternative; boundary="----_=_NextPart_001_01C56ACD.A9499163" Content-class: urn:content-classes:message Delivered-to: mccalpin.com-john@mccalpin.com Thread-topic: STREAM Results for the HP ProLiant DL585 and HP ProLiant BL45p with 852 Opteron CPUs Thread-index: AcVqzaiKv8eK0SnqTcmk7LmEorJdAA== X-MS-Has-Attach: X-MS-TNEF-Correlator: X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-OriginalArrivalTime: 06 Jun 2005 19:26:39.0498 (UTC) FILETIME=[A9873EA0:01C56ACD] X-NAS-Classification: 0 X-NAS-MessageID: 8546 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R This is a multi-part message in MIME format. ------_=_NextPart_001_01C56ACD.A9499163 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable John,=20 Below are standard STREAM results for the HP ProLiant DL585 (4 CPU), and the HP ProLiant BL45p (4 CPU) using 2.6Ghz 852 Opteron processors. The=20 configurations are described below with the results:=20 HP ProLiant DL585 4x2.6GHz/1MB L2 852 Opteron processors 32GB PC3200 memory (16x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 I used Revision 5.3 of the stream code and compiled with PathScale EKO C++ compiler v.2.1: pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o = ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 8000000, Offset =3D 41472 Total memory required =3D 183.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 8920 microseconds. (=3D 8920 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 17091.8122 0.0067 0.0075 0.0075 Scale: 16353.5567 0.0071 0.0078 0.0080 Add: 17225.8047 0.0100 0.0111 0.0112 Triad: 17233.9147 0.0101 0.0111 0.0113 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D HP ProLiant BL45p 4x2.6GHz/1MB L2 852 Opteron processors 32GB PC3200 memory (16x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 I used Revision 5.3 of the stream code and compiled with PathScale EKO C++ compiler v.2.1: pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o = ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 8000000, Offset =3D 49152 Total memory required =3D 183.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 8783 microseconds. (=3D 8783 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 17172.1760 0.0067 0.0075 0.0076 Scale: 16433.1470 0.0070 0.0078 0.0078 Add: 17261.6202 0.0100 0.0111 0.0113 Triad: 17322.1417 0.0100 0.0111 0.0111 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- David Schmidt Hewlett-Packard Company (281) 514-5039 D.Schmidt@hp.com ------_=_NextPart_001_01C56ACD.A9499163 Content-Type: text/html; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable STREAM Results for the HP ProLiant DL585 and HP ProLiant BL45p = with 852 Opteron CPUs

John,
Below are standard STREAM results for = the HP ProLiant DL585 (4 CPU), and the HP ProLiant BL45p (4 CPU) using = 2.6Ghz 852 Opteron processors. The
configurations are described below with the results:

HP ProLiant DL585
4x2.6GHz/1MB L2  852 Opteron = processors
32GB PC3200 memory (16x2GB = DIMMs)
SuSE Linux Enterprise Server 9 = (x86_64) SP1

I used Revision 5.3 of the stream = code and compiled with PathScale EKO C++ compiler v.2.1:

   pathcc -O3 = -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o ompstream = stream_omp.c


--------------------------------------------------------= -----
This system uses 8 bytes per DOUBLE = PRECISION word.
--------------------------------------------------------= -----
Array size =3D 8000000, Offset =3D = 41472
Total memory required =3D 183.1 = MB.
Each test is run 10 times, but = only
the *best* time for each is = used.
--------------------------------------------------------= -----
Number of Threads requested =3D = 4
Number of Threads requested =3D = 4
Number of Threads requested =3D = 4
Number of Threads requested =3D = 4
--------------------------------------------------------= -----
Your clock granularity/precision = appears to be 1 microseconds.
Each test below will take on the = order of 8920 microseconds.
   (=3D 8920 clock = ticks)
Increase the size of the arrays if = this shows that
you are not getting at least 20 = clock ticks per test.
--------------------------------------------------------= -----
WARNING -- The above is only a rough = guideline.
For best results, please be sure you = know the
precision of your system = timer.
--------------------------------------------------------= -----
Function      Rate = (MB/s)   Avg time     Min = time     Max time
Copy:       = 17091.8122       = 0.0067       = 0.0075       0.0075
Scale:      = 16353.5567       = 0.0071       = 0.0078       0.0080
Add:        = 17225.8047       = 0.0100       = 0.0111       0.0112
Triad:      = 17233.9147       = 0.0101       = 0.0111       0.0113
--------------------------------------------------------= -----
Solution Validates
--------------------------------------------------------= -----

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

HP ProLiant BL45p
4x2.6GHz/1MB L2  852 Opteron = processors
32GB PC3200 memory (16x2GB = DIMMs)
SuSE Linux Enterprise Server 9 = (x86_64) SP1

I used Revision 5.3 of the stream = code and compiled with PathScale EKO C++ compiler v.2.1:

   pathcc -O3 = -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o ompstream = stream_omp.c


--------------------------------------------------------= -----
This system uses 8 bytes per DOUBLE = PRECISION word.
--------------------------------------------------------= -----
Array size =3D 8000000, Offset =3D = 49152
Total memory required =3D 183.1 = MB.
Each test is run 10 times, but = only
the *best* time for each is = used.
--------------------------------------------------------= -----
Number of Threads requested =3D = 4
Number of Threads requested =3D = 4
Number of Threads requested =3D = 4
Number of Threads requested =3D = 4
--------------------------------------------------------= -----
Your clock granularity/precision = appears to be 1 microseconds.
Each test below will take on the = order of 8783 microseconds.
   (=3D 8783 clock = ticks)
Increase the size of the arrays if = this shows that
you are not getting at least 20 = clock ticks per test.
--------------------------------------------------------= -----
WARNING -- The above is only a rough = guideline.
For best results, please be sure you = know the
precision of your system = timer.
--------------------------------------------------------= -----
Function      Rate = (MB/s)   Avg time     Min = time     Max time
Copy:       = 17172.1760       = 0.0067       = 0.0075       0.0076
Scale:      = 16433.1470       = 0.0070       = 0.0078       0.0078
Add:        = 17261.6202       = 0.0100       = 0.0111       0.0113
Triad:      = 17322.1417       = 0.0100       = 0.0111       0.0111
--------------------------------------------------------= -----
Solution Validates
--------------------------------------------------------= -----

David = Schmidt
Hewlett-Packard = Company
(281) = 514-5039
D.Schmidt@hp.com

------_=_NextPart_001_01C56ACD.A9499163-- From - Thu Jun 9 05:24:56 2005 X-Account-Key: account2 X-UIDL: 76642-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IHO008RYGW10N@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 06 Jun 2005 14:31:16 -0500 (CDT) Received: from hrndva-mx-02.mgw.rr.com (hrndva-mx-02.mgw.rr.com [24.28.204.21]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IHO00EKLGW2LF@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 06 Jun 2005 14:31:15 -0500 (CDT) Received: from fife-smtp.catalog.com (HELO aux153.plano.net) (209.217.36.155) by hrndva-mx-02.mgw.rr.com with SMTP; Mon, 06 Jun 2005 15:31:14 -0400 Received: (qmail 16489 invoked by uid 10000); Mon, 06 Jun 2005 19:32:53 +0000 Received: (qmail 16322 invoked from network); Mon, 06 Jun 2005 19:32:52 +0000 Received: from unknown (HELO barracuda.catalog.com) (209.217.52.101) by smtp3.catalog.com with SMTP; Mon, 06 Jun 2005 19:32:52 +0000 Received: from ccerelbas02.cce.hp.com (ccerelbas02.cce.hp.com [161.114.21.105]) by barracuda.catalog.com (Spam Firewall) with ESMTP id 36878D00094B for ; Mon, 06 Jun 2005 14:31:04 -0500 (CDT) Received: from cceexg11.americas.cpqcorp.net (cceexg11.americas.cpqcorp.net [16.81.1.59]) by ccerelbas02.cce.hp.com (Postfix) with ESMTP id DB9372000186 for ; Mon, 06 Jun 2005 14:31:03 -0500 (CDT) Received: from cceexc19.americas.cpqcorp.net ([16.81.1.60]) by cceexg11.americas.cpqcorp.net with Microsoft SMTPSVC(6.0.3790.211); Mon, 06 Jun 2005 14:30:03 -0500 Date: Mon, 06 Jun 2005 14:30:02 -0500 From: "Schmidt, David (Performance Eng.)" Subject: STREAM Results for the HP ProLiant DL585 and HP ProLiant BL45p with 875 Opteron CPUs To: john@mccalpin.com Message-id: MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-type: multipart/alternative; boundary="----_=_NextPart_001_01C56ACE.22ECD025" Content-class: urn:content-classes:message Delivered-to: mccalpin.com-john@mccalpin.com Thread-topic: STREAM Results for the HP ProLiant DL585 and HP ProLiant BL45p with 875 Opteron CPUs Thread-index: AcVqziK0Q4P5/vckSfmDcQvgA2c0Pw== X-MS-Has-Attach: X-MS-TNEF-Correlator: X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-OriginalArrivalTime: 06 Jun 2005 19:30:03.0586 (UTC) FILETIME=[232C9A20:01C56ACE] X-NAS-Classification: 0 X-NAS-MessageID: 8547 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R This is a multi-part message in MIME format. ------_=_NextPart_001_01C56ACE.22ECD025 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable John,=20 Below are standard STREAM results for the HP ProLiant DL585 (4 CPU), and the HP ProLiant BL45p (4 CPU) using 2.2Ghz 875 Opteron processors. The=20 configurations are described below with the results:=20 HP ProLiant DL585 4x2.2GHz/1MB L2 875 Opteron processors 32GB PC3200 memory (16x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 I used Revision 5.3 of the stream code and compiled with PathScale EKO C++ compiler v.2.1: pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o = ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 8000000, Offset =3D 41472 Total memory required =3D 183.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 8168 microseconds. (=3D 8168 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 15950.2930 0.0072 0.0080 0.0080 Scale: 16031.7401 0.0072 0.0080 0.0081 Add: 16206.6083 0.0107 0.0118 0.0120 Triad: 16256.0078 0.0107 0.0118 0.0120 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D HP ProLiant BL45p 4x2.2GHz/1MB L2 875 Opteron processors 32GB PC3200 memory (16x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 I used Revision 5.3 of the stream code and compiled with PathScale EKO C++ compiler v.2.1: pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o = ompstream stream_omp.c=20 Running Test: N =3D 8000000 Offset =3D 49152 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 8000000, Offset =3D 49152 Total memory required =3D 183.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 7992 microseconds. (=3D 7992 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 16171.7848 0.0071 0.0079 0.0081 Scale: 16322.2337 0.0071 0.0078 0.0080 Add: 16022.4900 0.0108 0.0120 0.0122 Triad: 16089.7158 0.0108 0.0119 0.0120 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- David Schmidt Hewlett-Packard Company (281) 514-5039 D.Schmidt@hp.com ------_=_NextPart_001_01C56ACE.22ECD025 Content-Type: text/html; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable STREAM Results for the HP ProLiant DL585 and HP ProLiant BL45p = with 875 Opteron CPUs

John,
Below are standard STREAM results for = the HP ProLiant DL585 (4 CPU), and the HP ProLiant BL45p (4 CPU) using = 2.2Ghz 875 Opteron processors. The
configurations are described below with the results:

HP ProLiant DL585
4x2.2GHz/1MB L2  875 Opteron = processors
32GB PC3200 memory (16x2GB = DIMMs)
SuSE Linux Enterprise Server 9 = (x86_64) SP1

I used Revision 5.3 of the stream = code and compiled with PathScale EKO C++ compiler v.2.1:

   pathcc -O3 = -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o ompstream = stream_omp.c


--------------------------------------------------------= -----
This system uses 8 bytes per DOUBLE = PRECISION word.
--------------------------------------------------------= -----
Array size =3D 8000000, Offset =3D = 41472
Total memory required =3D 183.1 = MB.
Each test is run 10 times, but = only
the *best* time for each is = used.
--------------------------------------------------------= -----
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
--------------------------------------------------------= -----
Your clock granularity/precision = appears to be 1 microseconds.
Each test below will take on the = order of 8168 microseconds.
   (=3D 8168 clock = ticks)
Increase the size of the arrays if = this shows that
you are not getting at least 20 = clock ticks per test.
--------------------------------------------------------= -----
WARNING -- The above is only a rough = guideline.
For best results, please be sure you = know the
precision of your system = timer.
--------------------------------------------------------= -----
Function      Rate = (MB/s)   Avg time     Min = time     Max time
Copy:       = 15950.2930       = 0.0072       = 0.0080       0.0080
Scale:      = 16031.7401       = 0.0072       = 0.0080       0.0081
Add:        = 16206.6083       = 0.0107       = 0.0118       0.0120
Triad:      = 16256.0078       = 0.0107       = 0.0118       0.0120
--------------------------------------------------------= -----
Solution Validates
--------------------------------------------------------= -----

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

HP ProLiant BL45p
4x2.2GHz/1MB L2  875 Opteron = processors
32GB PC3200 memory (16x2GB = DIMMs)
SuSE Linux Enterprise Server 9 = (x86_64) SP1

I used Revision 5.3 of the stream = code and compiled with PathScale EKO C++ compiler v.2.1:

   pathcc -O3 = -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o ompstream = stream_omp.c


Running Test: N =3D 8000000 Offset = =3D  49152
--------------------------------------------------------= -----
This system uses 8 bytes per DOUBLE = PRECISION word.
--------------------------------------------------------= -----
Array size =3D 8000000, Offset =3D = 49152
Total memory required =3D 183.1 = MB.
Each test is run 10 times, but = only
the *best* time for each is = used.
--------------------------------------------------------= -----
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
Number of Threads requested =3D = 8
--------------------------------------------------------= -----
Your clock granularity/precision = appears to be 1 microseconds.
Each test below will take on the = order of 7992 microseconds.
   (=3D 7992 clock = ticks)
Increase the size of the arrays if = this shows that
you are not getting at least 20 = clock ticks per test.
--------------------------------------------------------= -----
WARNING -- The above is only a rough = guideline.
For best results, please be sure you = know the
precision of your system = timer.
--------------------------------------------------------= -----
Function      Rate = (MB/s)   Avg time     Min = time     Max time
Copy:       = 16171.7848       = 0.0071       = 0.0079       0.0081
Scale:      = 16322.2337       = 0.0071       = 0.0078       0.0080
Add:        = 16022.4900       = 0.0108       = 0.0120       0.0122
Triad:      = 16089.7158       = 0.0108       = 0.0119       0.0120
--------------------------------------------------------= -----
Solution Validates
--------------------------------------------------------= -----
David = Schmidt
Hewlett-Packard = Company
(281) = 514-5039
D.Schmidt@hp.com

------_=_NextPart_001_01C56ACE.22ECD025-- From - Mon Jun 27 21:55:59 2005 X-Account-Key: account2 X-UIDL: 81936-1059743608 X-Mozilla-Status: 0007 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-02 (ms-mta-02-smtp [10.93.38.12]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IJC00INF70GKY@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Fri, 08 Jul 2005 20:33:52 -0500 (CDT) Received: from clmboh-mx-11.mgw.rr.com (clmboh-mx-11.mgw.rr.com [65.24.7.65]) by ms-mta-02.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IJC0017O70G3R@ms-mta-02.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Fri, 08 Jul 2005 21:33:52 -0400 (EDT) Received: from ares.cs.virginia.edu (128.143.137.19) by clmboh-mx-11.mgw.rr.com with ESMTP; Fri, 08 Jul 2005 21:33:52 -0400 Received: from vader.inow.com (IDENT:0@vader.inow.com [198.144.96.11]) by ares.cs.Virginia.EDU (8.13.4/8.13.4/UVACS-2005041801) with ESMTP id j691XoYs007083 for ; Fri, 08 Jul 2005 21:33:51 -0400 (EDT) Received: from vader.inow.com (IDENT:807@localhost [127.0.0.1]) by vader.inow.com (8.13.4/8.13.4) with ESMTP id j691XiJU020007 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Fri, 08 Jul 2005 18:33:50 -0700 Received: from localhost (drbob@localhost) by vader.inow.com (8.13.4/8.13.4/Submit) with ESMTP id j691XiJs020004 for ; Fri, 08 Jul 2005 18:33:44 -0700 Date: Fri, 08 Jul 2005 18:33:44 -0700 (PDT) From: "Dr. Bob" Subject: stream results To: mccalpin@cs.virginia.edu Message-id: MIME-version: 1.0 Content-type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Authentication-warning: vader.inow.com: drbob owned process doing -bs Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 10559 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} Status: R Hi, I don't know if you are still updating the stream results table, but here's the score for my new server a Rackable C2004 with an ASUS PSCH-SR/IDE motherboard. 1GB RAM dual channel, Pentium4 3GHz 1M 800MHz FSB. The code is unmodified from your site and compiled with: gcc -O3 -march=prescott stream.c -o stream ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 5 microseconds. Each test below will take on the order of 8653 microseconds. (= 1730 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 2750.7889 0.0124 0.0116 0.0134 Scale: 2746.5718 0.0127 0.0117 0.0136 Add: 3238.8888 0.0160 0.0148 0.0171 Triad: 3253.7918 0.0160 0.0148 0.0171 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- Dr. Bob "A facility for quotation covers the absence of original thought." --Lord Peter Wimsey From - Fri Aug 12 13:20:35 2005 X-Account-Key: account2 X-UIDL: 87281-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IKY0040PN055Z@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Tue, 09 Aug 2005 10:00:08 -0500 (CDT) Received: from hrndva-mx-07.mgw.rr.com (hrndva-mx-07.mgw.rr.com [24.28.204.26]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IKY00EJZMZZRL@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Tue, 09 Aug 2005 10:00:07 -0500 (CDT) Received: from ares.cs.virginia.edu (128.143.137.19) by hrndva-mx-07.mgw.rr.com with ESMTP; Tue, 09 Aug 2005 11:00:06 -0400 Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by ares.cs.Virginia.EDU (8.13.4/8.13.4/UVACS-2005041801) with ESMTP id j79F04k3019294 for ; Tue, 09 Aug 2005 11:00:04 -0400 (EDT) Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e33.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j79Exwxk733084 for ; Tue, 09 Aug 2005 10:59:58 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j79F09FI120692 for ; Tue, 09 Aug 2005 09:00:10 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j79Exv7q017459 for ; Tue, 09 Aug 2005 08:59:57 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j79Exvg2017448; Tue, 09 Aug 2005 08:59:57 -0600 Date: Tue, 09 Aug 2005 09:00:07 -0600 From: Ly Vu Subject: standard STREAM on IBM eServer p5 575 (1500 MHz, 16cpu) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 Content-type: multipart/alternative; Boundary="0__=08BBFACBDFC22CAD8f9e8a93df938690918c08BBFACBDFC22CAD" Content-disposition: inline Importance: Normal X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 08/09/2005 09:00:07 Sensitivity: Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 11515 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=08BBFACBDFC22CAD8f9e8a93df938690918c08BBFACBDFC22CAD Content-type: text/plain; charset=US-ASCII These are standard STREAM results on an IBM eServer p5 575 with sixteen 1.5 GHz cpus. This is a POWER5 SMP machine. Large pages were used in all cases. Function Rate (MB/s) Avg time Min time Max time Copy: 38218.6049 .1123 .1123 .1124 Scale: 37825.6681 .1135 .1135 .1136 Add: 42179.1238 .1527 .1527 .1528 Triad: 42631.8617 .1512 .1510 .1514 Here is the full output file: -------------------------------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array = 8 CPU binding list : 0 2 4 6 8 10 12 14 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403160412979200 Shared Segment Pointer = 504403162560462848 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 2147483648 (MB = 2048 ) Array Size (DW) = 268435456 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 rebind: num_parthds is 16 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 2560 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 268301312 The total memory requirement is 6140 MB You are running each test 5 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 38218.6049 .1123 .1123 .1124 Scale: 37825.6681 .1135 .1135 .1136 Add: 42179.1238 .1527 .1527 .1528 Triad: 42631.8617 .1512 .1510 .1514 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --0__=08BBFACBDFC22CAD8f9e8a93df938690918c08BBFACBDFC22CAD Content-type: text/html; charset=US-ASCII Content-Disposition: inline


These are standard STREAM results on an IBM eServer p5 575
with sixteen 1.5 GHz cpus. This is a POWER5 SMP machine.
Large pages were used in all cases.


Function Rate (MB/s) Avg time Min time Max time
Copy: 38218.6049 .1123 .1123 .1124
Scale: 37825.6681 .1135 .1135 .1136
Add: 42179.1238 .1527 .1527 .1528
Triad: 42631.8617 .1512 .1510 .1514


Here is the full output file:
--------------------------------------------------

Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 8
CPU binding list : 0 2 4 6 8 10 12 14
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403160412979200
Shared Segment Pointer = 504403162560462848
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 2147483648 (MB = 2048 )
Array Size (DW) = 268435456
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
rebind: num_parthds is 16
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000

Incremental Offset = 2560
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 268301312
The total memory requirement is 6140 MB
You are running each test 5 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 38218.6049 .1123 .1123 .1124
Scale: 37825.6681 .1135 .1135 .1136
Add: 42179.1238 .1527 .1527 .1528
Triad: 42631.8617 .1512 .1510 .1514
----------------------------------------------------
Solution Validates!
----------------------------------------------------


______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com --0__=08BBFACBDFC22CAD8f9e8a93df938690918c08BBFACBDFC22CAD-- From - Fri Aug 12 13:20:36 2005 X-Account-Key: account2 X-UIDL: 87283-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-01 (ms-mta-01-smtp [10.93.38.11]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IKY004N5NB65Z@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Tue, 09 Aug 2005 10:06:51 -0500 (CDT) Received: from hrndva-mx-07.mgw.rr.com (hrndva-mx-07.mgw.rr.com [24.28.204.26]) by ms-mta-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0IKY005IENBATZ@ms-mta-01.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Tue, 09 Aug 2005 11:06:47 -0400 (EDT) Received: from ares.cs.virginia.edu (128.143.137.19) by hrndva-mx-07.mgw.rr.com with ESMTP; Tue, 09 Aug 2005 11:06:46 -0400 Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.132]) by ares.cs.Virginia.EDU (8.13.4/8.13.4/UVACS-2005041801) with ESMTP id j79F6hOp020196 for ; Tue, 09 Aug 2005 11:06:44 -0400 (EDT) Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e34.co.us.ibm.com (8.12.10/8.12.9) with ESMTP id j79F6cRX193534 for ; Tue, 09 Aug 2005 11:06:38 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j79F6QYl505618 for ; Tue, 09 Aug 2005 09:06:26 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j79F6bbh026902 for ; Tue, 09 Aug 2005 09:06:37 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j79F6bxS026888; Tue, 09 Aug 2005 09:06:37 -0600 Date: Tue, 09 Aug 2005 10:05:51 -0500 From: Ly Vu Subject: tuned STREAM on IBM eServer p5 575 (1500 MHz, 16cpu) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 X-Mailer: Lotus Notes Release 6.0.2CF1 June 9, 2003 Content-type: multipart/alternative; boundary="=_alternative 0052E44086257058_=" Importance: High X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF546 | May 23, 2005) at 08/09/2005 09:06:48, Serialize complete at 08/09/2005 09:06:48 Sensitivity: Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 11517 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} This is a multipart message in MIME format. --=_alternative 0052E44086257058_= Content-Type: text/plain; charset="US-ASCII" These are tuned STREAM results on an IBM eServer p5 575 with sixteen 1.5 GHz cpus. This is a POWER5 SMP machine. Large pages were used in all cases. Function Rate (MB/s) RMS time Min time Max time Copy: 50173.70 .09 .09 .09 Scale: 49010.25 .09 .09 .09 Add: 55184.19 .12 .12 .12 Triad: 55870.61 .12 .12 .12 Here is the full output file: -------------------------------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array = 8 CPU binding list : 0 2 4 6 8 10 12 14 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403160412979200 Shared Segment Pointer = 504403162560462848 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 2147483648 (MB = 2048 ) Array Size (DW) = 268435456 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 rebind: num_parthds is 16 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 512 Number of Threads = 16 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 268303360 Offset = 0 The total memory requirement is 6140 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 83383 microseconds (= 83383 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 50173.70 .09 .09 .09 Scale: 49010.25 .09 .09 .09 Add: 55184.19 .12 .12 .12 Triad: 55870.61 .12 .12 .12 Sum of a is = 407485728000000.000 Sum of b is = 81497145600000.0000 Sum of c is = 108662860800000.000 ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --=_alternative 0052E44086257058_= Content-Type: text/html; charset="US-ASCII"
These are tuned STREAM results on an IBM eServer p5 575
with sixteen 1.5 GHz cpus. This is a POWER5 SMP machine.
Large pages were used in all cases.


Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:        50173.70         .09         .09         .09
Scale:       49010.25         .09         .09         .09
Add:         55184.19         .12         .12         .12
Triad:       55870.61         .12         .12         .12


Here is the full output file:
--------------------------------------------------

Requesting Large Pages
 Setting up for  2 CPUs per module
 Number of segments per array =  8
 CPU binding list :  0 2 4 6 8 10 12 14
 Shared Segment Pointer =  504403158265495552
 Shared Segment Pointer =  504403160412979200
 Shared Segment Pointer =  504403162560462848
 Segment Size (B) =  268435456  (MB =  256 )
 Array   Size (B) =  2147483648  (MB =  2048 )
 Array   Size (DW) =  268435456
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 Num_threads =  16
 rebind: num_parthds is  16
 Starting Initialization
 Done With Initialization
 a(1) 1.00000000000000000
 b(M) 1.00000000000000000
 c(M) 1.00000000000000000
 Incremental Offset =  512
 Number of Threads =  16

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size =  268303360
 Offset     =          0
 The total memory requirement is 6140 MB
 You are running each test   5 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity appears to be less than one microsecond
 Your clock granularity/precision appears to be      1 microseconds
 The tests below will each take a time on the order
 of  83383  microseconds
    (=  83383  clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function     Rate (MB/s)  RMS time   Min time  Max time
Copy:        50173.70         .09         .09         .09
Scale:       49010.25         .09         .09         .09
Add:         55184.19         .12         .12         .12
Triad:       55870.61         .12         .12         .12
 Sum of a is =  407485728000000.000
 Sum of b is =  81497145600000.0000
 Sum of c is =  108662860800000.000
______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com
--=_alternative 0052E44086257058_=-- From - Fri Sep 30 21:46:57 2005 X-Account-Key: account2 X-UIDL: 94791-1059743608 X-Mozilla-Status: 0007 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-02 (ms-mta-02-smtp [10.93.38.12]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INJ0078EVNHKN@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Wed, 28 Sep 2005 18:23:41 -0500 (CDT) Received: from orngca-mx-03.mgw.rr.com (orngca-mx-03.mgw.rr.com [66.75.160.130]) by ms-mta-02.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INJ005OAVMRCH@ms-mta-02.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Wed, 28 Sep 2005 19:23:41 -0400 (EDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by orngca-mx-03.mgw.rr.com with SMTP; Wed, 28 Sep 2005 19:23:40 -0400 Received: (qmail 27025 invoked by uid 10000); Wed, 28 Sep 2005 23:26:24 +0000 Received: (qmail 26991 invoked from network); Wed, 28 Sep 2005 23:26:23 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.52.101) by smtp3.catalog.com with SMTP; Wed, 28 Sep 2005 23:26:23 +0000 Received: from ccerelbas01.cce.hp.com (ccerelbas01.cce.hp.com [161.114.21.104]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id E42EAD0003EB for ; Wed, 28 Sep 2005 18:23:34 -0500 (CDT) Received: from cceexg11.americas.cpqcorp.net (cceexg11.americas.cpqcorp.net [16.81.1.59]) by ccerelbas01.cce.hp.com (Postfix) with ESMTP id 1854C200009F for ; Wed, 28 Sep 2005 18:23:34 -0500 (CDT) Received: from cceexc19.americas.cpqcorp.net ([16.81.1.60]) by cceexg11.americas.cpqcorp.net with Microsoft SMTPSVC(6.0.3790.211); Wed, 28 Sep 2005 18:23:33 -0500 Date: Wed, 28 Sep 2005 18:23:33 -0500 From: "Schmidt, David (Performance Eng.)" Subject: STREAM results for the HP ProLiant BL20p G3, DL140 G2, and ML350 G4p with Intel 3.6Ghz Xeon CPUs To: john@mccalpin.com Message-id: MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-type: multipart/alternative; boundary="----_=_NextPart_001_01C5C483.A4EA006E" Content-class: urn:content-classes:message Delivered-to: mccalpin.com-john@mccalpin.com Thread-topic: STREAM results for the HP ProLiant BL20p G3, DL140 G2, and ML350 G4p with Intel 3.6Ghz Xeon CPUs Thread-index: AcXEg6RvPPplVjtwTFC93NV15LUxQw== X-MS-Has-Attach: X-MS-TNEF-Correlator: X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-OriginalArrivalTime: 28 Sep 2005 23:23:33.0982 (UTC) FILETIME=[A51CDFE0:01C5C483] X-NAS-Classification: 0 X-NAS-MessageID: 13053 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} This is a multi-part message in MIME format. ------_=_NextPart_001_01C5C483.A4EA006E Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable John,=20 Below are standard STREAM results for the HP ProLiant BL20p G3 (1 CPU), the HP ProLiant DL140 G2 (1 CPU), and the HP ProLiant BL45p (4 CPU) (Editor's correction: this last system should read "ML350 G4p") using 3.6GHz Intel Xeon processors. The configurations are described below with the results. Please post them to the STREAM website. Thanks, ----------------------------------------------------------------------- HP ProLiant BL20p G3 1x3.6GHz/2MB L2 Xeon processors 8GB PC3200 memory (4x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, Version 8.1, Build 20040812 icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 3500000, Offset =3D 5120 Total memory required =3D 80.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 15221 microseconds. (=3D 15221 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3764.4809 0.0150 0.0149 0.0155 Scale: 3723.8961 0.0151 0.0150 0.0151 Add: 4028.6037 0.0209 0.0209 0.0209 Triad: 3893.5719 0.0216 0.0216 0.0217 ------------------------------------------------------------- Solution Validates HP ProLiant DL140 G2 1x3.6GHz/2MB L2 Xeon processor 8GB memory (8x1024MB) SuSE Enterprise Linux 9 (x86_64) SP1 Intel(R) C++ Compiler for Intel(R)EM64T-based applications, Version 8.1 Build 20040812 icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 9000000, Offset =3D 512 Total memory required =3D 206.0 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 41546 microseconds. (=3D 41546 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3800.3837 0.0380 0.0379 0.0386 Scale: 3657.1367 0.0395 0.0394 0.0398 Add: 3994.7514 0.0541 0.0541 0.0542 Triad: 3914.8118 0.0552 0.0552 0.0553 ------------------------------------------------------------- Solution Validates HP ProLiant ML350 G4p 1x3.6GHz/2MB L2 Xeon processors 8GB PC3200 memory (4x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, Version 8.1, Build 20040812 icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 4500000, Offset =3D 16896 Total memory required =3D 103.0 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 19776 microseconds. (=3D 19776 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3834.5001 0.0189 0.0188 0.0195 Scale: 3756.0464 0.0192 0.0192 0.0192 Add: 4100.0781 0.0264 0.0263 0.0264 Triad: 3968.6773 0.0273 0.0272 0.0273 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- David Schmidt Hewlett-Packard Company (281) 514-5039 D.Schmidt@hp.com=20 ------_=_NextPart_001_01C5C483.A4EA006E Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable STREAM results for the HP ProLiant BL20p G3, DL140 G2, and ML350 = G4p with Intel 3.6Ghz Xeon CPUs

John,

Below are = standard STREAM results for the HP ProLiant BL20p = G3 (1 = CPU), the HP ProLiant DL140 G2 (1 CPU), and the = HP ProLiant BL45p (4 CPU) using 3.6GHz = Intel Xeon processors. The configurations = are described below with the results. Please = post them to the STREAM website. Thanks,

--------------------------------------------------------------------= ---

HP ProLiant BL20p G3

1x3.6GHz/2MB L2 Xeon = processors

8GB PC3200 memory (4x2GB DIMMs)

SuSE Linux Enterprise Server 9 (x86_64) SP1

Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, = Version 8.1, Build 20040812

icc -O3 -xP -parallel -o ompstream stream_d.c = second_wall.c

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 3500000, Offset =3D 5120

Total memory required =3D 80.1 MB.

Each test is run 10 times, but only

the *best* time for each is used.

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 15221 = microseconds.

   (=3D 15221 clock ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   RMS = time     Min time     Max = time

Copy:        = 3764.4809       = 0.0150       = 0.0149       0.0155

Scale:       = 3723.8961       = 0.0151       = 0.0150       0.0151

Add:         = 4028.6037       = 0.0209       = 0.0209       0.0209

Triad:       = 3893.5719       = 0.0216       = 0.0216       0.0217

-------------------------------------------------------------=

Solution Validates


HP ProLiant DL140 G2

1x3.6GHz/2MB L2 Xeon = processor

8GB memory (8x1024MB)

SuSE Enterprise Linux 9 (x86_64) SP1

Intel(R) C++ Compiler for Intel(R)EM64T-based applications, Version = 8.1    Build 20040812

icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c =

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 9000000, Offset =3D 512

Total memory required =3D 206.0 MB.

Each test is run 10 times, but only

the *best* time for each is used.

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 41546 = microseconds.

   (=3D 41546 clock ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   RMS = time     Min time     Max = time

Copy:        = 3800.3837       = 0.0380       = 0.0379       0.0386

Scale:       = 3657.1367       = 0.0395       = 0.0394       0.0398

Add:         = 3994.7514       = 0.0541       = 0.0541       0.0542

Triad:       = 3914.8118       = 0.0552       = 0.0552       0.0553

-------------------------------------------------------------=

Solution Validates


HP ProLiant ML350 G4p

1x3.6GHz/2MB L2 Xeon = processors

8GB PC3200 memory (4x2GB DIMMs)

SuSE Linux Enterprise Server 9 (x86_64) SP1

Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, = Version 8.1, Build 20040812

icc -O3 -xP -parallel -o ompstream stream_d.c = second_wall.c

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 4500000, Offset =3D 16896

Total memory required =3D 103.0 MB.

Each test is run 10 times, but only

the *best* time for each is used.

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 19776 = microseconds.

   (=3D 19776 clock ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   RMS = time     Min time     Max = time

Copy:        = 3834.5001       = 0.0189       = 0.0188       0.0195

Scale:       = 3756.0464       = 0.0192       = 0.0192       0.0192

Add:         = 4100.0781       = 0.0264       = 0.0263       0.0264

Triad:       = 3968.6773       = 0.0273       = 0.0272       0.0273

-------------------------------------------------------------=

Solution Validates

-------------------------------------------------------------=

David = Schmidt

Hewlett-Packard Company

(281) 514-5039

D.Schmidt@hp.com =

------_=_NextPart_001_01C5C483.A4EA006E-- From - Fri Sep 30 21:46:58 2005 X-Account-Key: account2 X-UIDL: 94795-1059743608 X-Mozilla-Status: 0007 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INJ0078ZW40KN@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Wed, 28 Sep 2005 18:33:36 -0500 (CDT) Received: from clmboh-mx-11.mgw.rr.com (clmboh-mx-11.mgw.rr.com [65.24.7.65]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INJ002I5W3TO0@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Wed, 28 Sep 2005 18:33:36 -0500 (CDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by clmboh-mx-11.mgw.rr.com with SMTP; Wed, 28 Sep 2005 19:33:36 -0400 Received: (qmail 3103 invoked by uid 10000); Wed, 28 Sep 2005 23:36:19 +0000 Received: (qmail 3041 invoked from network); Wed, 28 Sep 2005 23:36:19 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Wed, 28 Sep 2005 23:36:19 +0000 Received: from ccerelbas04.cce.hp.com (ccerelbas04.cce.hp.com [161.114.21.107]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id A665ED01726A for ; Wed, 28 Sep 2005 18:33:15 -0500 (CDT) Received: from cceexg12.americas.cpqcorp.net (cceexg12.americas.cpqcorp.net [16.81.1.38]) by ccerelbas04.cce.hp.com (Postfix) with ESMTP id 11BA520001C7 for ; Wed, 28 Sep 2005 18:33:15 -0500 (CDT) Received: from cceexc19.americas.cpqcorp.net ([16.81.1.60]) by cceexg12.americas.cpqcorp.net with Microsoft SMTPSVC(6.0.3790.211); Wed, 28 Sep 2005 18:32:21 -0500 Date: Wed, 28 Sep 2005 18:32:21 -0500 From: "Schmidt, David (Performance Eng.)" Subject: STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, DL380 G4, and ML370 G4 with Intel 3.8Ghz Xeon CPUs To: john@mccalpin.com Message-id: MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-type: multipart/alternative; boundary="----_=_NextPart_001_01C5C484.DFC78438" Content-class: urn:content-classes:message Delivered-to: mccalpin.com-john@mccalpin.com Thread-topic: STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, DL380 G4, and ML370 G4 with Intel 3.8Ghz Xeon CPUs Thread-index: AcXEhN9Ya9G3QTKiS+KSXRmXpxpsgQ== X-MS-Has-Attach: X-MS-TNEF-Correlator: X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-OriginalArrivalTime: 28 Sep 2005 23:32:21.0547 (UTC) FILETIME=[DF90E7B0:01C5C484] X-NAS-Classification: 0 X-NAS-MessageID: 13054 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} This is a multi-part message in MIME format. ------_=_NextPart_001_01C5C484.DFC78438 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable John,=20 Below are standard STREAM results for the HP ProLiant BL20p G3 (1 CPU), the HP ProLiant DL140 G2 (1 CPU), the HP ProLiant DL360 G4p (1 CPU), the HP ProLiant DL380 G4 (1 CPU), and the HP ProLiant ML370 G4 (4 CPU) using 3.8GHz Intel Xeon processors. The configurations are described below with the results. Please post them to the STREAM website. Thanks, HP ProLiant BL20p G3 2x3.8GHz/2MB L2 Xeon processors 8GB PC3200 memory (4x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, Version 8.1, Build 20040812 icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 3500000, Offset =3D 14336 Total memory required =3D 80.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 15160 microseconds. (=3D 15160 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3767.2584 0.0150 0.0149 0.0156 Scale: 3730.8759 0.0151 0.0150 0.0152 Add: 4020.1456 0.0209 0.0209 0.0209 Triad: 3899.9074 0.0216 0.0215 0.0216 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- ------------------------------------------------------------------------ --------------- HP ProLiant DL140 G2 2x3.8GHz/2MB L2 Xeon processor=20 8GB memory (8x1024MB) SuSE Enterprise Linux 9 (x86_64) SP1=20 Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1 Build 20040812 icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 9000000, Offset =3D 20480 Total memory required =3D 206.0 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 44713 microseconds. (=3D 44713 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3917.2916 0.0369 0.0368 0.0375 Scale: 3760.3727 0.0383 0.0383 0.0384 Add: 4086.4114 0.0529 0.0529 0.0533 Triad: 4004.8345 0.0540 0.0539 0.0540 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- ------------------------------------------------------------------------ --------------- HP ProLiant DL360 G4p 2x3.8GHz/2MB L2 Xeon processor=20 8GB memory (4x2048MB Dual Rank DIMMs) SuSE Enterprise Linux 9 (x86_64) SP1=20 Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1 Build 20040812 icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 9000000, Offset =3D 16896 Total memory required =3D 206.0 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 40524 microseconds. (=3D 40524 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3912.0902 0.0369 0.0368 0.0375 Scale: 3768.9375 0.0382 0.0382 0.0383 Add: 4149.2957 0.0521 0.0521 0.0521 Triad: 4045.5549 0.0534 0.0534 0.0535 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- ------------------------------------------------------------------------ --------------- HP ProLiant DL380 G4 2x3.8GHz/2MB L2 Xeon processor 8GB memory (4x2048MB Dual Rank DIMMs) SuSE Enterprise Linux 9 (x86_64) SP1 Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1 Build 20040812 icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 9000000, Offset =3D 24576 Total memory required =3D 206.0 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 40484 microseconds. (=3D 40484 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3919.6559 0.0369 0.0367 0.0375 Scale: 3775.2747 0.0382 0.0381 0.0382 Add: 4151.7676 0.0521 0.0520 0.0521 Triad: 4049.7142 0.0534 0.0533 0.0535 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- ------------------------------------------------------------------------ --------------- HP ProLiant ML370 G4 1x3.8GHz/2MB L2 Xeon processor=20 8GB memory (8x1024MB) SuSE Enterprise Linux 9 (x86_64) SP1=20 Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1 Build 20040812 icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 9000000, Offset =3D 2048 Total memory required =3D 206.0 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 40999 microseconds. (=3D 40999 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 3893.0521 0.0371 0.0370 0.0377 Scale: 3750.0064 0.0384 0.0384 0.0384 Add: 4119.2423 0.0525 0.0524 0.0525 Triad: 4013.6703 0.0539 0.0538 0.0539 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- David Schmidt Hewlett-Packard Company (281) 514-5039 D.Schmidt@hp.com=20 ------_=_NextPart_001_01C5C484.DFC78438 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, = DL380 G4, and ML370 G4 with Intel 3.8Ghz Xeon CPUs

John,

Below are = standard STREAM results for the HP ProLiant BL20p = G3 (1 = CPU), the HP ProLiant DL140 G2 (1 CPU), the HP = ProLiant DL360 G4p (1 CPU), the HP = ProLiant DL380 G4 (1 CPU), and the HP = ProLiant = ML370 G4 (4 CPU) = using 3.8GHz = Intel Xeon processors. The configurations = are described below with the results. Please = post them to the STREAM website. Thanks,

HP ProLiant BL20p G3

2x3.8GHz/2MB L2 Xeon processors

8GB PC3200 memory (4x2GB DIMMs)

SuSE Linux Enterprise Server 9 (x86_64) SP1

Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, = Version 8.1, Build 20040812

icc -O3 -xP -parallel -o ompstream stream_d.c = second_wall.c

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 3500000, Offset =3D 14336

Total memory required =3D 80.1 MB.

Each test is run 10 times, but only

the *best* time for each is used.

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 15160 = microseconds.

   (=3D 15160 clock ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   RMS = time     Min time     Max = time

Copy:        = 3767.2584       = 0.0150       = 0.0149       0.0156

Scale:       = 3730.8759       = 0.0151       = 0.0150       0.0152

Add:         = 4020.1456       = 0.0209       = 0.0209       0.0209

Triad:       = 3899.9074       = 0.0216       = 0.0215       0.0216

-------------------------------------------------------------=

Solution Validates

-------------------------------------------------------------=

--------------------------------------------------------------------= -------------------

HP ProLiant DL140 G2

2x3.8GHz/2MB L2 Xeon processor

8GB memory (8x1024MB)

SuSE Enterprise Linux 9 (x86_64) SP1

Intel(R) C++ Compiler for Intel(R) EM64T-based applications, = Version 8.1 Build 20040812

icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c =

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 9000000, Offset =3D 20480

Total memory required =3D 206.0 MB.

Each test is run 10 times, but only

the *best* time for each is used.

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 44713 = microseconds.

   (=3D 44713 clock ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   RMS = time     Min time     Max = time

Copy:        = 3917.2916       = 0.0369       = 0.0368       0.0375

Scale:       = 3760.3727       = 0.0383       = 0.0383       0.0384

Add:         = 4086.4114       = 0.0529       = 0.0529       0.0533

Triad:       = 4004.8345       = 0.0540       = 0.0539       0.0540

-------------------------------------------------------------=

Solution Validates

-------------------------------------------------------------=

--------------------------------------------------------------------= -------------------

HP ProLiant DL360 G4p

2x3.8GHz/2MB L2 Xeon processor

8GB memory (4x2048MB Dual Rank DIMMs)

SuSE Enterprise Linux 9 (x86_64) SP1

Intel(R) C++ Compiler for Intel(R) EM64T-based applications, = Version 8.1 Build 20040812

icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c =

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 9000000, Offset =3D 16896

Total memory required =3D 206.0 MB.

Each test is run 10 times, but only

the *best* time for each is used.

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 40524 = microseconds.

   (=3D 40524 clock ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   RMS = time     Min time     Max = time

Copy:        = 3912.0902       = 0.0369       = 0.0368       0.0375

Scale:       = 3768.9375       = 0.0382       = 0.0382       0.0383

Add:         = 4149.2957       = 0.0521       = 0.0521       0.0521

Triad:       = 4045.5549       = 0.0534       = 0.0534       0.0535

-------------------------------------------------------------=

Solution Validates

-------------------------------------------------------------=

--------------------------------------------------------------------= -------------------

HP ProLiant DL380 G4

2x3.8GHz/2MB L2 Xeon processor

8GB memory (4x2048MB Dual Rank DIMMs)

SuSE Enterprise Linux 9 (x86_64) SP1

Intel(R) C++ Compiler for Intel(R) EM64T-based applications, = Version 8.1    Build 20040812

icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c =

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 9000000, Offset =3D 24576

Total memory required =3D 206.0 MB.

Each test is run 10 times, but only

the *best* time for each is used.

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 40484 = microseconds.

   (=3D 40484 clock ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   RMS = time     Min time     Max = time

Copy:        = 3919.6559       = 0.0369       = 0.0367       0.0375

Scale:       = 3775.2747       = 0.0382       = 0.0381       0.0382

Add:         = 4151.7676       = 0.0521       = 0.0520       0.0521

Triad:       = 4049.7142       = 0.0534       = 0.0533       0.0535

-------------------------------------------------------------=

Solution Validates

-------------------------------------------------------------=

--------------------------------------------------------------------= -------------------

HP ProLiant ML370 G4

1x3.8GHz/2MB L2 Xeon processor

8GB memory (8x1024MB)

SuSE Enterprise Linux 9 (x86_64) SP1

Intel(R) C++ Compiler for Intel(R) EM64T-based applications, = Version 8.1 Build 20040812

icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c =

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 9000000, Offset =3D 2048

Total memory required =3D 206.0 MB.

Each test is run 10 times, but only

the *best* time for each is used.

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 40999 = microseconds.

   (=3D 40999 clock ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough guideline.

For best results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   RMS = time     Min time     Max = time

Copy:        = 3893.0521       = 0.0371       = 0.0370       0.0377

Scale:       = 3750.0064       = 0.0384       = 0.0384       0.0384

Add:         = 4119.2423       = 0.0525       = 0.0524       0.0525

Triad:       = 4013.6703       = 0.0539       = 0.0538       0.0539

-------------------------------------------------------------=

Solution Validates

-------------------------------------------------------------=

David = Schmidt

Hewlett-Packard Company

(281) 514-5039

D.Schmidt@hp.com =

------_=_NextPart_001_01C5C484.DFC78438-- From - Fri Sep 30 21:46:58 2005 X-Account-Key: account2 X-UIDL: 94796-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INJ007ARWFIKN@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Wed, 28 Sep 2005 18:40:30 -0500 (CDT) Received: from hrndva-mx-02.mgw.rr.com (hrndva-mx-02.mgw.rr.com [24.28.204.21]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INJ0011WWEE4H@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Wed, 28 Sep 2005 18:40:30 -0500 (CDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by hrndva-mx-02.mgw.rr.com with SMTP; Wed, 28 Sep 2005 19:40:29 -0400 Received: (qmail 3251 invoked by uid 10000); Wed, 28 Sep 2005 23:43:13 +0000 Received: (qmail 3171 invoked from network); Wed, 28 Sep 2005 23:43:12 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Wed, 28 Sep 2005 23:43:12 +0000 Received: from ccerelbas04.cce.hp.com (ccerelbas04.cce.hp.com [161.114.21.107]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id AD3B4D039F7F for ; Wed, 28 Sep 2005 18:40:20 -0500 (CDT) Received: from cceexg12.americas.cpqcorp.net (cceexg12.americas.cpqcorp.net [16.81.1.38]) by ccerelbas04.cce.hp.com (Postfix) with ESMTP id 095B02000264 for ; Wed, 28 Sep 2005 18:40:20 -0500 (CDT) Received: from cceexc19.americas.cpqcorp.net ([16.81.1.60]) by cceexg12.americas.cpqcorp.net with Microsoft SMTPSVC(6.0.3790.211); Wed, 28 Sep 2005 18:39:51 -0500 Date: Wed, 28 Sep 2005 18:39:51 -0500 From: "Schmidt, David (Performance Eng.)" Subject: STREAM results for the HP ProLiant BL45p and DL585 with AMD 880 Opteron CPUs To: john@mccalpin.com Message-id: MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-type: multipart/alternative; boundary="----_=_NextPart_001_01C5C485.EBE7B8C8" Content-class: urn:content-classes:message Delivered-to: mccalpin.com-john@mccalpin.com Thread-topic: STREAM results for the HP ProLiant BL45p and DL585 with AMD 880 Opteron CPUs Thread-index: AcXEhet5Yvk4Ccf/RDSgDhsKlmIVTw== X-MS-Has-Attach: X-MS-TNEF-Correlator: X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-OriginalArrivalTime: 28 Sep 2005 23:39:51.0221 (UTC) FILETIME=[EB97B650:01C5C485] X-NAS-Classification: 0 X-NAS-MessageID: 13055 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} This is a multi-part message in MIME format. ------_=_NextPart_001_01C5C485.EBE7B8C8 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable John,=20 Below are standard STREAM results for the HP ProLiant BL45p (4 CPUs, 8 cores) and the HP ProLiant DL585 (4 CPUs, 8 cores) using 2.4GHz AMD 880 Opteron dual-core processors. The configurations are described below with the results. Please post them to the STREAM website. Thanks, HP ProLiant BL45p 4x2.4GHz /1MB L2 (per core) dual core 880 Opteron processors 32GB PC3200 memory (16x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 I used Revision 5.3 of the stream code and compiled with PathScale EKO C++ compiler v.2.1: pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o = ompstream stream_omp.c=20 For 4x2.4GHz Opteron 880 processors (8 cores): ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 8000000, Offset =3D 49152 Total memory required =3D 183.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 7460 microseconds. (=3D 7460 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 17613.8751 0.0066 0.0073 0.0074 Scale: 17682.3303 0.0065 0.0072 0.0074 Add: 17782.6783 0.0098 0.0108 0.0109 Triad: 17786.2130 0.0098 0.0108 0.0109 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- ------------------------------------------------------------------------ ------------------ HP ProLiant DL585 4x2.4GHz/1MB L2 (per core) 880 Opteron processors 32GB PC3200 memory (16x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 I used Revision 5.3 of the stream code and compiled with PathScale EKO C++ compiler v.2.1: pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o = ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 8000000, Offset =3D 41472 Total memory required =3D 183.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 Number of Threads requested =3D 8 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 7652 microseconds. (=3D 7652 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 16892.8263 0.0069 0.0076 0.0078 Scale: 16913.0489 0.0068 0.0076 0.0076 Add: 16493.3922 0.0105 0.0116 0.0119 Triad: 16507.5920 0.0105 0.0116 0.0117 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- David Schmidt Hewlett-Packard Company (281) 514-5039 D.Schmidt@hp.com=20 ------_=_NextPart_001_01C5C485.EBE7B8C8 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable STREAM results for the HP ProLiant BL45p and DL585 with AMD 880 = Opteron CPUs

John,

Below are standard STREAM results for the HP ProLiant = BL45p (4 CPUs, 8 cores) and the HP ProLiant = DL585 (4 = CPUs, 8 cores) using 2.4GHz AMD 880 Opteron dual-core processors. The = configurations are described below with the results. Please post them to = the STREAM website. Thanks,

HP = ProLiant BL45p

4x2.4GHz = /1MB L2 (per core) dual core 880 Opteron processors

32GB = PC3200 memory (16x2GB DIMMs)

SuSE = Linux Enterprise Server 9 (x86_64) SP1

I used = Revision 5.3 of the stream code and compiled with PathScale EKO C++ = compiler v.2.1:

   pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 = -mp -o ompstream stream_omp.c


For = 4x2.4GHz Opteron 880 processors (8 cores):

-------------------------------------------------------------=

This = system uses 8 bytes per DOUBLE PRECISION word.

-------------------------------------------------------------=

Array = size =3D 8000000, Offset =3D 49152

Total = memory required =3D 183.1 MB.

Each = test is run 10 times, but only

the = *best* time for each is used.

-------------------------------------------------------------=

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

-------------------------------------------------------------=

Your = clock granularity/precision appears to be 1 = microseconds.

Each = test below will take on the order of 7460 = microseconds.

   (=3D 7460 clock ticks)

Increase = the size of the arrays if this shows that

you are = not getting at least 20 clock ticks per test.

-------------------------------------------------------------=

WARNING = -- The above is only a rough guideline.

For best = results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   Avg = time     Min time     Max = time

Copy:       = 17613.8751       = 0.0066       = 0.0073       0.0074

Scale:      = 17682.3303       = 0.0065       = 0.0072       0.0074

Add:        = 17782.6783       = 0.0098       = 0.0108       0.0109

Triad:      = 17786.2130       = 0.0098       = 0.0108       0.0109

-------------------------------------------------------------=

Solution = Validates

-------------------------------------------------------------=

--------------------------------------------------------------------= ----------------------

HP = ProLiant DL585

4x2.4GHz/1MB L2 (per core)  880 Opteron = processors

32GB = PC3200 memory (16x2GB DIMMs)

SuSE = Linux Enterprise Server 9 (x86_64) SP1

I used = Revision 5.3 of the stream code and compiled with PathScale EKO C++ = compiler v.2.1:

   pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 = -mp -o ompstream stream_omp.c


-------------------------------------------------------------=

This = system uses 8 bytes per DOUBLE PRECISION word.

-------------------------------------------------------------=

Array = size =3D 8000000, Offset =3D 41472

Total = memory required =3D 183.1 MB.

Each = test is run 10 times, but only

the = *best* time for each is used.

-------------------------------------------------------------=

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

Number = of Threads requested =3D 8

-------------------------------------------------------------=

Your = clock granularity/precision appears to be 1 = microseconds.

Each = test below will take on the order of 7652 = microseconds.

   (=3D 7652 clock ticks)

Increase = the size of the arrays if this shows that

you are = not getting at least 20 clock ticks per test.

-------------------------------------------------------------=

WARNING = -- The above is only a rough guideline.

For best = results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   Avg = time     Min time     Max = time

Copy:       = 16892.8263       = 0.0069       = 0.0076       0.0078

Scale:      = 16913.0489       = 0.0068       = 0.0076       0.0076

Add:        = 16493.3922       = 0.0105       = 0.0116       0.0119

Triad:      = 16507.5920       = 0.0105       = 0.0116       0.0117

-------------------------------------------------------------=

Solution = Validates

-------------------------------------------------------------=

David = Schmidt

Hewlett-Packard Company

(281) 514-5039

D.Schmidt@hp.com =

------_=_NextPart_001_01C5C485.EBE7B8C8-- From - Fri Sep 30 21:46:58 2005 X-Account-Key: account2 X-UIDL: 94797-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INJ007VYWQEKN@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Wed, 28 Sep 2005 18:47:09 -0500 (CDT) Received: from orngca-mx-08.mgw.rr.com (orngca-mx-08.mgw.rr.com [66.75.160.142]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INJ0016WWQA4H@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Wed, 28 Sep 2005 18:47:06 -0500 (CDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by orngca-mx-08.mgw.rr.com with SMTP; Wed, 28 Sep 2005 19:47:05 -0400 Received: (qmail 28758 invoked by uid 10000); Wed, 28 Sep 2005 23:49:49 +0000 Received: (qmail 28733 invoked from network); Wed, 28 Sep 2005 23:49:49 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Wed, 28 Sep 2005 23:49:49 +0000 Received: from ccerelbas01.cce.hp.com (ccerelbas01.cce.hp.com [161.114.21.104]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id 1D704D03A7BA for ; Wed, 28 Sep 2005 18:46:48 -0500 (CDT) Received: from cceexg11.americas.cpqcorp.net (cceexg11.americas.cpqcorp.net [16.81.1.59]) by ccerelbas01.cce.hp.com (Postfix) with ESMTP id AB851200009F for ; Wed, 28 Sep 2005 18:46:47 -0500 (CDT) Received: from cceexc19.americas.cpqcorp.net ([16.81.1.60]) by cceexg11.americas.cpqcorp.net with Microsoft SMTPSVC(6.0.3790.211); Wed, 28 Sep 2005 18:46:47 -0500 Date: Wed, 28 Sep 2005 18:46:47 -0500 From: "Schmidt, David (Performance Eng.)" Subject: STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, DL380 G4, and ML370 G4 with AMD 2.8GHz Opteron CPUs To: john@mccalpin.com Message-id: MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft Exchange V6.5.7226.0 Content-type: multipart/alternative; boundary="----_=_NextPart_001_01C5C486.E375AA4A" Content-class: urn:content-classes:message Delivered-to: mccalpin.com-john@mccalpin.com Thread-topic: STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, DL380 G4, and ML370 G4 with AMD 2.8GHz Opteron CPUs Thread-index: AcXEhuMEYgOF/6bcScKuA6/l7FSZZQ== X-MS-Has-Attach: X-MS-TNEF-Correlator: X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-OriginalArrivalTime: 28 Sep 2005 23:46:47.0123 (UTC) FILETIME=[E37D5230:01C5C486] X-NAS-Classification: 0 X-NAS-MessageID: 13056 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} This is a multi-part message in MIME format. ------_=_NextPart_001_01C5C486.E375AA4A Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable John, Below are standard STREAM results for the HP ProLiant BL25p (2 CPUs), the HP ProLiant BL45p (4 CPUs), the HP ProLiant DL385 (2 CPUs), and the HP ProLiant DL585 (4 CPU) using 2.8GHz AMD Opteron processors. The configurations are described below with the results. Please post them to the STREAM website. Thanks, HP ProLiant BL25p 2x2.8GHz/1MB L2 Opteron 254 processors 16GB memory (8x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 - kernel 2.6.5-7.139-smp I used Revision 5.3 of the stream code and compiled with PathScale EKO Compiler Suite 2.1 and used the following: /opt/pathscale/bin/pathcc -O3 -CG:use_prefetchnta = -LNO:prefetch_ahead=3D4 -mp -o ompstream stream_omp.c For 2x2.8GHz Opteron 254 processors: ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 2500000, Offset =3D 26112 Total memory required =3D 57.2 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 2 Number of Threads requested =3D 2 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 5116 microseconds. (=3D 5116 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 9368.0362 0.0039 0.0043 0.0043 Scale: 9206.1106 0.0040 0.0043 0.0047 Add: 9457.6361 0.0057 0.0063 0.0064 Triad: 9499.4051 0.0057 0.0063 0.0064 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- ------------------------------------------------------------------------ ------- HP ProLiant BL45p 4x2.8GHz /1MB L2 854 Opteron processors 32GB PC3200 memory (16x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 I used Revision 5.3 of the stream code and compiled with PathScale EKO C++ compiler v.2.1: pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o = ompstream stream_omp.c=20 For 4x2.8GHz Opteron 854 processors: ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 8000000, Offset =3D 49152 Total memory required =3D 183.1 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 8164 microseconds. (=3D 8164 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 18515.9825 0.0062 0.0069 0.0070 Scale: 17664.8760 0.0066 0.0072 0.0074 Add: 18595.7227 0.0093 0.0103 0.0104 Triad: 18689.8062 0.0093 0.0103 0.0103 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- ------------------------------------------------------------------------ ------- HP ProLiant DL385 2x2.8Hz/1MB L2 254 Opteron processors 16GB PC3200 DDR memory (8x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) I used Revision 5.3 of the stream code and compiled with PathScale C/C++ for Linux v.2.1: pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o ompstream -o ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 1000000, Offset =3D 59392 Total memory required =3D 22.9 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 2 Number of Threads requested =3D 2 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 2033 microseconds. (=3D 2033 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 9400.3171 0.0015 0.0017 0.0017 Scale: 9205.6055 0.0016 0.0017 0.0017 Add: 9638.3853 0.0022 0.0025 0.0025 Triad: 9721.2261 0.0022 0.0025 0.0025 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- ------------------------------------------------------------------------ ------- HP ProLiant DL585 4x2.8GHz/1MB L2 854 Opteron processors 32GB PC3200 memory (16x2GB DIMMs) SuSE Linux Enterprise Server 9 (x86_64) SP1 I used Revision 5.3 of the stream code and compiled with PathScale EKO C++ compiler v.2.1: pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 -mp -o = ompstream stream_omp.c=20 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size =3D 7000000, Offset =3D 58880 Total memory required =3D 160.2 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 Number of Threads requested =3D 4 ------------------------------------------------------------- Your clock granularity/precision appears to be 1 microseconds. Each test below will take on the order of 7424 microseconds. (=3D 7424 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 18285.7940 0.0055 0.0061 0.0062 Scale: 18223.3706 0.0055 0.0061 0.0062 Add: 18346.7355 0.0082 0.0092 0.0092 Triad: 18386.9497 0.0082 0.0091 0.0093 ------------------------------------------------------------- Solution Validates ------------------------------------------------------------- David Schmidt Hewlett-Packard Company (281) 514-5039 D.Schmidt@hp.com=20 ------_=_NextPart_001_01C5C486.E375AA4A Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, = DL380 G4, and ML370 G4 with AMD 2.8GHz Opteron CPUs

John,

Below are = standard STREAM results for the HP ProLiant BL25p (2 CPUs), the HP = ProLiant BL45p = (4 = CPUs), the = HP ProLiant DL385 = (2 = CPUs), and the HP = ProLiant DL585 = (4 = CPU) = using 2.8GHz AMD = Opteron processors. The configurations are described below = with the results. Please post them to the STREAM website. = Thanks,

HP ProLiant BL25p

2x2.8GHz/1MB L2 Opteron 254 = processors

16GB memory (8x2GB DIMMs)

SuSE Linux Enterprise Server 9 (x86_64) SP1 - = kernel 2.6.5-7.139-smp

I used Revision 5.3 of the stream code and compiled = with PathScale EKO Compiler Suite 2.1 and used the = following:

/opt/pathscale/bin/pathcc -O3 -CG:use_prefetchnta = -LNO:prefetch_ahead=3D4 -mp  -o ompstream = stream_omp.c


For 2x2.8GHz Opteron 254 = processors:

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 2500000, Offset =3D = 26112

Total memory required =3D 57.2 = MB.

Each test is run 10 times, but = only

the *best* time for each is used.

-------------------------------------------------------------=

Number of Threads requested =3D 2

Number of Threads requested =3D 2

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 5116 = microseconds.

   (=3D 5116 clock = ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough = guideline.

For best results, please be sure you know = the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate = (MB/s)   Avg time     Min = time     Max time

Copy:        = 9368.0362       = 0.0039       = 0.0043       0.0043

Scale:       = 9206.1106       = 0.0040       = 0.0043       0.0047

Add:         = 9457.6361       = 0.0057       = 0.0063       0.0064

Triad:       = 9499.4051       = 0.0057       = 0.0063       0.0064

-------------------------------------------------------------=

Solution Validates

-------------------------------------------------------------=

--------------------------------------------------------------------= -----------

HP ProLiant BL45p

4x2.8GHz /1MB L2  854 Opteron = processors

32GB PC3200 memory (16x2GB DIMMs)

SuSE Linux Enterprise Server 9 (x86_64) = SP1

I used Revision 5.3 of the stream code and compiled = with PathScale EKO C++ compiler v.2.1:

   pathcc -O3 -CG:use_prefetchnta = -LNO:prefetch_ahead=3D4 -mp -o ompstream stream_omp.c


For 4x2.8GHz Opteron 854 = processors:

-------------------------------------------------------------=

This system uses 8 bytes per DOUBLE PRECISION = word.

-------------------------------------------------------------=

Array size =3D 8000000, Offset =3D = 49152

Total memory required =3D 183.1 = MB.

Each test is run 10 times, but = only

the *best* time for each is used.

-------------------------------------------------------------=

Number of Threads requested =3D 4

Number of Threads requested =3D 4

Number of Threads requested =3D 4

Number of Threads requested =3D 4

-------------------------------------------------------------=

Your clock granularity/precision appears to be 1 = microseconds.

Each test below will take on the order of 8164 = microseconds.

   (=3D 8164 clock = ticks)

Increase the size of the arrays if this shows = that

you are not getting at least 20 clock ticks per = test.

-------------------------------------------------------------=

WARNING -- The above is only a rough = guideline.

For best results, please be sure you know = the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate = (MB/s)   Avg time     Min = time     Max time

Copy:       = 18515.9825       = 0.0062       = 0.0069       0.0070

Scale:      = 17664.8760       = 0.0066       = 0.0072       0.0074

Add:        = 18595.7227       = 0.0093       = 0.0103       0.0104

Triad:      = 18689.8062       = 0.0093       = 0.0103       0.0103

-------------------------------------------------------------=

Solution Validates

-------------------------------------------------------------=

--------------------------------------------------------------------= -----------

HP = ProLiant DL385

2x2.8Hz/1MB L2  254 Opteron processors

16GB = PC3200 DDR memory (8x2GB DIMMs)

SuSE = Linux Enterprise Server 9 (x86_64)

I used = Revision 5.3 of the stream code and compiled with PathScale C/C++ for = Linux v.2.1:

   pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 = -mp  -o ompstream -o ompstream stream_omp.c

-------------------------------------------------------------=

This = system uses 8 bytes per DOUBLE PRECISION word.

-------------------------------------------------------------=

Array = size =3D 1000000, Offset =3D 59392

Total = memory required =3D 22.9 MB.

Each = test is run 10 times, but only

the = *best* time for each is used.

-------------------------------------------------------------=

Number = of Threads requested =3D 2

Number = of Threads requested =3D 2

-------------------------------------------------------------=

Your = clock granularity/precision appears to be 1 = microseconds.

Each = test below will take on the order of 2033 = microseconds.

   (=3D 2033 clock ticks)

Increase = the size of the arrays if this shows that

you are = not getting at least 20 clock ticks per test.

-------------------------------------------------------------=

WARNING = -- The above is only a rough guideline.

For best = results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   Avg = time     Min time     Max = time

Copy:        = 9400.3171       = 0.0015       = 0.0017       0.0017

Scale:       = 9205.6055       = 0.0016       = 0.0017       0.0017

Add:         = 9638.3853       = 0.0022       = 0.0025       0.0025

Triad:       = 9721.2261       = 0.0022       = 0.0025       0.0025

-------------------------------------------------------------=

Solution = Validates

-------------------------------------------------------------=

--------------------------------------------------------------------= -----------

HP = ProLiant DL585

4x2.8GHz/1MB L2  854 Opteron processors

32GB = PC3200 memory (16x2GB DIMMs)

SuSE = Linux Enterprise Server 9 (x86_64) SP1

I used = Revision 5.3 of the stream code and compiled with PathScale EKO C++ = compiler v.2.1:

   pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=3D4 = -mp -o ompstream stream_omp.c


-------------------------------------------------------------=

This = system uses 8 bytes per DOUBLE PRECISION word.

-------------------------------------------------------------=

Array = size =3D 7000000, Offset =3D 58880

Total = memory required =3D 160.2 MB.

Each = test is run 10 times, but only

the = *best* time for each is used.

-------------------------------------------------------------=

Number = of Threads requested =3D 4

Number = of Threads requested =3D 4

Number = of Threads requested =3D 4

Number = of Threads requested =3D 4

-------------------------------------------------------------=

Your = clock granularity/precision appears to be 1 = microseconds.

Each = test below will take on the order of 7424 = microseconds.

   (=3D 7424 clock ticks)

Increase = the size of the arrays if this shows that

you are = not getting at least 20 clock ticks per test.

-------------------------------------------------------------=

WARNING = -- The above is only a rough guideline.

For best = results, please be sure you know the

precision of your system timer.

-------------------------------------------------------------=

Function      Rate (MB/s)   Avg = time     Min time     Max = time

Copy:       = 18285.7940       = 0.0055       = 0.0061       0.0062

Scale:      = 18223.3706       = 0.0055       = 0.0061       0.0062

Add:        = 18346.7355       = 0.0082       = 0.0092       0.0092

Triad:      = 18386.9497       = 0.0082       = 0.0091       0.0093

-------------------------------------------------------------=

Solution = Validates

-------------------------------------------------------------=

David = Schmidt

Hewlett-Packard Company

(281) 514-5039

D.Schmidt@hp.com =

------_=_NextPart_001_01C5C486.E375AA4A-- From - Mon Oct 3 16:48:48 2005 X-Account-Key: account2 X-UIDL: 95689-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-01 (ms-mta-01-smtp [10.93.38.11]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00CX0SF49R@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 03 Oct 2005 13:52:18 -0500 (CDT) Received: from hrndva-mx-09.mgw.rr.com (hrndva-mx-09.mgw.rr.com [24.28.204.28]) by ms-mta-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS001HRSF6XT@ms-mta-01.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 03 Oct 2005 14:52:18 -0400 (EDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by hrndva-mx-09.mgw.rr.com with SMTP; Mon, 03 Oct 2005 14:52:17 -0400 Received: (qmail 22435 invoked by uid 10000); Mon, 03 Oct 2005 18:55:03 +0000 Received: (qmail 22373 invoked from network); Mon, 03 Oct 2005 18:55:03 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Mon, 03 Oct 2005 18:55:03 +0000 Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id ED615D038261 for ; Mon, 03 Oct 2005 13:52:11 -0500 (CDT) Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id j93IpGme014337 for ; Mon, 03 Oct 2005 14:51:16 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j93Iq8tP514148 for ; Mon, 03 Oct 2005 12:52:08 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j93Iq76n007954 for ; Mon, 03 Oct 2005 12:52:08 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av01.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j93Iq728007938; Mon, 03 Oct 2005 12:52:07 -0600 Date: Mon, 03 Oct 2005 12:52:06 -0600 From: Ly Vu Subject: standard STREAM on IBM System p5 505 (2CPUs 1.65 GHz) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 Content-type: multipart/alternative; Boundary="0__=08BBFA1CDFF40EDC8f9e8a93df938690918c08BBFA1CDFF40EDC" Content-disposition: inline Importance: Normal X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF654 | July 22, 2005) at 10/03/2005 12:52:07 Sensitivity: Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 13790 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=08BBFA1CDFF40EDC8f9e8a93df938690918c08BBFA1CDFF40EDC Content-type: text/plain; charset=US-ASCII These are standard STREAM results on an IBM System p5 505 with two 1.65 GHz cpus. This is a POWER5 SMP machine. Large pages were used in all cases. Function Rate (MB/s) Avg time Min time Max time Copy: 6019.1183 .0892 .0891 .0893 Scale: 5932.5789 .0906 .0904 .0907 Add: 7534.6825 .1069 .1068 .1070 Triad: 7653.1454 .1053 .1052 .1055 Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array = 1 CPU binding list : 0 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403158533931008 Shared Segment Pointer = 504403158802366464 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 268435456 (MB = 256 ) Array Size (DW) = 33554432 Num_threads = 2 Num_threads = 2 rebind: num_parthds is 2 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 1536 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 33534976 The total memory requirement is 767 MB You are running each test 5 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 6019.1183 .0892 .0891 .0893 Scale: 5932.5789 .0906 .0904 .0907 Add: 7534.6825 .1069 .1068 .1070 Triad: 7653.1454 .1053 .1052 .1055 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --0__=08BBFA1CDFF40EDC8f9e8a93df938690918c08BBFA1CDFF40EDC Content-type: text/html; charset=US-ASCII Content-Disposition: inline



These are standard STREAM results on an IBM System p5 505
with two 1.65 GHz cpus. This is a POWER5 SMP machine.
Large pages were used in all cases.

Function Rate (MB/s) Avg time Min time Max time
Copy: 6019.1183 .0892 .0891 .0893
Scale: 5932.5789 .0906 .0904 .0907
Add: 7534.6825 .1069 .1068 .1070
Triad: 7653.1454 .1053 .1052 .1055

Here is the full output file:
-----------------------------

Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 1
CPU binding list : 0
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158533931008
Shared Segment Pointer = 504403158802366464
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 268435456 (MB = 256 )
Array Size (DW) = 33554432
Num_threads = 2
Num_threads = 2
rebind: num_parthds is 2
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000

Incremental Offset = 1536
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33534976
The total memory requirement is 767 MB
You are running each test 5 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 6019.1183 .0892 .0891 .0893
Scale: 5932.5789 .0906 .0904 .0907
Add: 7534.6825 .1069 .1068 .1070
Triad: 7653.1454 .1053 .1052 .1055
----------------------------------------------------
Solution Validates!
----------------------------------------------------



______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com --0__=08BBFA1CDFF40EDC8f9e8a93df938690918c08BBFA1CDFF40EDC-- From - Mon Oct 3 16:48:45 2005 X-Account-Key: account2 X-UIDL: 95686-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00CDYSCW9R@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 03 Oct 2005 13:50:56 -0500 (CDT) Received: from hrndva-mx-09.mgw.rr.com (hrndva-mx-09.mgw.rr.com [24.28.204.28]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00K9HSCR2C@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 03 Oct 2005 13:50:56 -0500 (CDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by hrndva-mx-09.mgw.rr.com with SMTP; Mon, 03 Oct 2005 14:50:55 -0400 Received: (qmail 28670 invoked by uid 10000); Mon, 03 Oct 2005 18:53:41 +0000 Received: (qmail 28610 invoked from network); Mon, 03 Oct 2005 18:53:41 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.50.102) by smtp3.catalog.com with SMTP; Mon, 03 Oct 2005 18:53:41 +0000 Received: from e33.co.us.ibm.com (e33.co.us.ibm.com [32.97.110.151]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id 71949D02E7D0 for ; Mon, 03 Oct 2005 13:50:49 -0500 (CDT) Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e33.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id j93InFf2008111 for ; Mon, 03 Oct 2005 14:49:15 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j93IpRQm033114 for ; Mon, 03 Oct 2005 12:51:28 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j93IokjZ000463 for ; Mon, 03 Oct 2005 12:50:46 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j93Iojmq000457; Mon, 03 Oct 2005 12:50:45 -0600 Date: Mon, 03 Oct 2005 12:50:44 -0600 From: Ly Vu Subject: tuned STREAM on IBM System p5 505 (2CPUs 1.65 GHz) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 Content-type: multipart/alternative; Boundary="0__=08BBFA1CDFF4CC4A8f9e8a93df938690918c08BBFA1CDFF4CC4A" Content-disposition: inline Importance: Normal X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF654 | July 22, 2005) at 10/03/2005 12:50:45 Sensitivity: Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 13787 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=08BBFA1CDFF4CC4A8f9e8a93df938690918c08BBFA1CDFF4CC4A Content-type: text/plain; charset=US-ASCII These are tuned STREAM results on an IBM System p5 505 with two 1.65 GHz cpus. This is a POWER5 SMP machine. Large pages were used in all cases. Function Rate (MB/s) RMS time Min time Max time Copy: 6248.96 .09 .09 .09 Scale: 6136.57 .09 .09 .09 Add: 8819.20 .09 .09 .09 Triad: 9012.07 .09 .09 .09 Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array = 1 CPU binding list : 0 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403158533931008 Shared Segment Pointer = 504403158802366464 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 268435456 (MB = 256 ) Array Size (DW) = 33554432 Num_threads = 2 Num_threads = 2 rebind: num_parthds is 2 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 1536 Number of Threads = 2 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 33537024 Offset = 0 The total memory requirement is 767 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 87531 microseconds (= 87531 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 6248.96 .09 .09 .09 Scale: 6136.57 .09 .09 .09 Add: 8819.20 .09 .09 .09 Triad: 9012.07 .09 .09 .09 Sum of a is = 50934355200000.0000 Sum of b is = 10186871040000.0000 Sum of c is = 13582494720000.0000 ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --0__=08BBFA1CDFF4CC4A8f9e8a93df938690918c08BBFA1CDFF4CC4A Content-type: text/html; charset=US-ASCII Content-Disposition: inline




These are tuned STREAM results on an IBM System p5 505
with two 1.65 GHz cpus. This is a POWER5 SMP machine.
Large pages were used in all cases.


Function Rate (MB/s) RMS time Min time Max time
Copy: 6248.96 .09 .09 .09
Scale: 6136.57 .09 .09 .09
Add: 8819.20 .09 .09 .09
Triad: 9012.07 .09 .09 .09


Here is the full output file:
-----------------------------
Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 1
CPU binding list : 0
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158533931008
Shared Segment Pointer = 504403158802366464
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 268435456 (MB = 256 )
Array Size (DW) = 33554432
Num_threads = 2
Num_threads = 2
rebind: num_parthds is 2
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000

Incremental Offset = 1536
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33537024
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 87531 microseconds
(= 87531 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 6248.96 .09 .09 .09
Scale: 6136.57 .09 .09 .09
Add: 8819.20 .09 .09 .09
Triad: 9012.07 .09 .09 .09
Sum of a is = 50934355200000.0000
Sum of b is = 10186871040000.0000
Sum of c is = 13582494720000.0000

______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com --0__=08BBFA1CDFF4CC4A8f9e8a93df938690918c08BBFA1CDFF4CC4A-- From - Mon Oct 3 17:18:45 2005 X-Account-Key: account2 X-UIDL: 95694-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00CMUSNW9R@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 03 Oct 2005 13:57:32 -0500 (CDT) Received: from clmboh-mx-05.mgw.rr.com (clmboh-mx-05.mgw.rr.com [65.24.7.19]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00KM3SNP2C@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 03 Oct 2005 13:57:31 -0500 (CDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by clmboh-mx-05.mgw.rr.com with SMTP; Mon, 03 Oct 2005 14:57:31 -0400 Received: (qmail 26957 invoked by uid 10000); Mon, 03 Oct 2005 19:00:16 +0000 Received: (qmail 26746 invoked from network); Mon, 03 Oct 2005 19:00:16 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Mon, 03 Oct 2005 19:00:16 +0000 Received: from e34.co.us.ibm.com (e34.co.us.ibm.com [32.97.110.152]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id A092CD033A87 for ; Mon, 03 Oct 2005 13:57:25 -0500 (CDT) Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e34.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id j93ItdBO011935 for ; Mon, 03 Oct 2005 14:55:39 -0400 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j93IvMtP476306 for ; Mon, 03 Oct 2005 12:57:23 -0600 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j93IvMgG023251 for ; Mon, 03 Oct 2005 12:57:22 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av04.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j93IvMh0023243; Mon, 03 Oct 2005 12:57:22 -0600 Date: Mon, 03 Oct 2005 12:57:19 -0600 From: Ly Vu Subject: standard STREAM on IBM System p5 520 (2CPUs 1.9 GHz) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 Content-type: multipart/alternative; Boundary="0__=08BBFA1CDFFB82B48f9e8a93df938690918c08BBFA1CDFFB82B4" Content-disposition: inline Importance: Normal X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF654 | July 22, 2005) at 10/03/2005 12:57:22 Sensitivity: Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 13795 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=08BBFA1CDFFB82B48f9e8a93df938690918c08BBFA1CDFFB82B4 Content-type: text/plain; charset=US-ASCII These are standard STREAM results on an IBM System p5 520 with two 1.9 GHz cpus. This is a POWER5+ SMP machine. Large pages were used in all cases. Function Rate (MB/s) Avg time Min time Max time Copy: 8190.9887 .0656 .0655 .0656 Scale: 8060.8585 .0666 .0666 .0667 Add: 9432.9679 .0854 .0853 .0856 Triad: 9672.1572 .0834 .0832 .0837 Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array = 1 CPU binding list : 0 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403158533931008 Shared Segment Pointer = 504403158802366464 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 268435456 (MB = 256 ) Array Size (DW) = 33554432 Num_threads = 2 Num_threads = 2 rebind: num_parthds is 2 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 1536 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 33539072 The total memory requirement is 767 MB You are running each test 5 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 8190.9887 .0656 .0655 .0656 Scale: 8060.8585 .0666 .0666 .0667 Add: 9432.9679 .0854 .0853 .0856 Triad: 9672.1572 .0834 .0832 .0837 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --0__=08BBFA1CDFFB82B48f9e8a93df938690918c08BBFA1CDFFB82B4 Content-type: text/html; charset=US-ASCII Content-Disposition: inline



These are standard STREAM results on an IBM System p5 520
with two 1.9 GHz cpus. This is a POWER5+ SMP machine.
Large pages were used in all cases.

Function Rate (MB/s) Avg time Min time Max time
Copy: 8190.9887 .0656 .0655 .0656
Scale: 8060.8585 .0666 .0666 .0667
Add: 9432.9679 .0854 .0853 .0856
Triad: 9672.1572 .0834 .0832 .0837

Here is the full output file:
-----------------------------

Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 1
CPU binding list : 0
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158533931008
Shared Segment Pointer = 504403158802366464
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 268435456 (MB = 256 )
Array Size (DW) = 33554432
Num_threads = 2
Num_threads = 2
rebind: num_parthds is 2
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000

Incremental Offset = 1536
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33539072
The total memory requirement is 767 MB
You are running each test 5 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 8190.9887 .0656 .0655 .0656
Scale: 8060.8585 .0666 .0666 .0667
Add: 9432.9679 .0854 .0853 .0856
Triad: 9672.1572 .0834 .0832 .0837
----------------------------------------------------
Solution Validates!
----------------------------------------------------


______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com --0__=08BBFA1CDFFB82B48f9e8a93df938690918c08BBFA1CDFFB82B4-- From - Mon Oct 3 17:18:44 2005 X-Account-Key: account2 X-UIDL: 95691-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-02 (ms-mta-02-smtp [10.93.38.12]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00CE9SJ29R@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 03 Oct 2005 13:54:39 -0500 (CDT) Received: from hrndva-mx-04.mgw.rr.com (hrndva-mx-04.mgw.rr.com [24.28.204.23]) by ms-mta-02.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00CWPSJ2OX@ms-mta-02.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 03 Oct 2005 14:54:39 -0400 (EDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by hrndva-mx-04.mgw.rr.com with SMTP; Mon, 03 Oct 2005 14:54:38 -0400 Received: (qmail 4864 invoked by uid 10000); Mon, 03 Oct 2005 18:57:24 +0000 Received: (qmail 4310 invoked from network); Mon, 03 Oct 2005 18:57:22 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.50.102) by smtp3.catalog.com with SMTP; Mon, 03 Oct 2005 18:57:22 +0000 Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id 5C255D02E7D0 for ; Mon, 03 Oct 2005 13:54:26 -0500 (CDT) Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e32.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id j93IrWkW016563 for ; Mon, 03 Oct 2005 14:53:32 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay04.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j93It3Qm074014 for ; Mon, 03 Oct 2005 12:55:03 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j93IsLjL014291 for ; Mon, 03 Oct 2005 12:54:21 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av01.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j93IsLBs014269; Mon, 03 Oct 2005 12:54:21 -0600 Date: Mon, 03 Oct 2005 12:54:19 -0600 From: Ly Vu Subject: tuned STREAM on IBM System p5 520 (2CPUs 1.9 GHz) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 Content-type: multipart/alternative; Boundary="0__=08BBFA1CDFF446D08f9e8a93df938690918c08BBFA1CDFF446D0" Content-disposition: inline Importance: Normal X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF654 | July 22, 2005) at 10/03/2005 12:54:21 Sensitivity: Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 13792 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=08BBFA1CDFF446D08f9e8a93df938690918c08BBFA1CDFF446D0 Content-type: text/plain; charset=US-ASCII These are tuned STREAM results on an IBM System p5 520 with two 1.9 GHz cpus. This is a POWER5+ SMP machine. Large pages were used in all cases. Function Rate (MB/s) RMS time Min time Max time Copy: 8486.63 .06 .06 .07 Scale: 8362.39 .06 .06 .06 Add: 10136.66 .08 .08 .08 Triad: 10318.95 .08 .08 .08 Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array = 1 CPU binding list : 0 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403158533931008 Shared Segment Pointer = 504403158802366464 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 268435456 (MB = 256 ) Array Size (DW) = 33554432 Num_threads = 2 Num_threads = 2 rebind: num_parthds is 2 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 512 Number of Threads = 2 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 33534976 Offset = 0 The total memory requirement is 767 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 65454 microseconds (= 65454 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 8486.63 .06 .06 .07 Scale: 8362.39 .06 .06 .06 Add: 10136.66 .08 .08 .08 Triad: 10318.95 .08 .08 .08 Sum of a is = 50931170381250.0000 Sum of b is = 10186234076250.0000 Sum of c is = 13581645435000.0000 ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --0__=08BBFA1CDFF446D08f9e8a93df938690918c08BBFA1CDFF446D0 Content-type: text/html; charset=US-ASCII Content-Disposition: inline




These are tuned STREAM results on an IBM System p5 520
with two 1.9 GHz cpus. This is a POWER5+ SMP machine.
Large pages were used in all cases.


Function Rate (MB/s) RMS time Min time Max time
Copy: 8486.63 .06 .06 .07
Scale: 8362.39 .06 .06 .06
Add: 10136.66 .08 .08 .08
Triad: 10318.95 .08 .08 .08


Here is the full output file:
-----------------------------

Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 1
CPU binding list : 0
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158533931008
Shared Segment Pointer = 504403158802366464
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 268435456 (MB = 256 )
Array Size (DW) = 33554432
Num_threads = 2
Num_threads = 2
rebind: num_parthds is 2
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000

Incremental Offset = 512
Number of Threads = 2
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 33534976
Offset = 0
The total memory requirement is 767 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 65454 microseconds
(= 65454 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 8486.63 .06 .06 .07
Scale: 8362.39 .06 .06 .06
Add: 10136.66 .08 .08 .08
Triad: 10318.95 .08 .08 .08
Sum of a is = 50931170381250.0000
Sum of b is = 10186234076250.0000
Sum of c is = 13581645435000.0000

______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com --0__=08BBFA1CDFF446D08f9e8a93df938690918c08BBFA1CDFF446D0-- From - Mon Oct 3 17:18:46 2005 X-Account-Key: account2 X-UIDL: 95698-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-02 (ms-mta-02-smtp [10.93.38.12]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00CXPSTP9R@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 03 Oct 2005 14:01:08 -0500 (CDT) Received: from orngca-mx-10.mgw.rr.com (orngca-mx-10.mgw.rr.com [66.75.160.144]) by ms-mta-02.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00HOJSTSP2@ms-mta-02.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 03 Oct 2005 15:01:04 -0400 (EDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by orngca-mx-10.mgw.rr.com with SMTP; Mon, 03 Oct 2005 15:01:04 -0400 Received: (qmail 6609 invoked by uid 10000); Mon, 03 Oct 2005 19:03:49 +0000 Received: (qmail 6483 invoked from network); Mon, 03 Oct 2005 19:03:49 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.52.101) by smtp3.catalog.com with SMTP; Mon, 03 Oct 2005 19:03:49 +0000 Received: from e36.co.us.ibm.com (e36.co.us.ibm.com [32.97.110.154]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id 52657D001127 for ; Mon, 03 Oct 2005 14:00:59 -0500 (CDT) Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e36.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id j93IxYD1022303 for ; Mon, 03 Oct 2005 14:59:34 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j93J0stP534558 for ; Mon, 03 Oct 2005 13:00:54 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j93J0rVK012503 for ; Mon, 03 Oct 2005 13:00:54 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j93J0r1r012492; Mon, 03 Oct 2005 13:00:53 -0600 Date: Mon, 03 Oct 2005 13:00:46 -0600 From: Ly Vu Subject: standard STREAM on IBM System p5 550 (4CPUs 1.9 GHz) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 Content-type: multipart/alternative; Boundary="0__=08BBFA1CDFFBCD4D8f9e8a93df938690918c08BBFA1CDFFBCD4D" Content-disposition: inline Importance: Normal X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF654 | July 22, 2005) at 10/03/2005 13:00:53 Sensitivity: Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 13799 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=08BBFA1CDFFBCD4D8f9e8a93df938690918c08BBFA1CDFFBCD4D Content-type: text/plain; charset=US-ASCII These are standard STREAM results on an IBM System p5 550 with four 1.9 GHz cpus. This is a POWER5+ SMP machine. Large pages were used in all cases. Function Rate (MB/s) Avg time Min time Max time Copy: 16074.3482 .0669 .0668 .0672 Scale: 15747.5122 .0682 .0682 .0684 Add: 18497.5093 .0871 .0870 .0872 Triad: 19043.7449 .0853 .0845 .0867 Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array = 2 CPU binding list : 0 2 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403158802366464 Shared Segment Pointer = 504403159339237376 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 536870912 (MB = 512 ) Array Size (DW) = 67108864 Num_threads = 4 Num_threads = 4 Num_threads = 4 Num_threads = 4 rebind: num_parthds is 4 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 1536 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 67075072 The total memory requirement is 1535 MB You are running each test 5 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 16074.3482 .0669 .0668 .0672 Scale: 15747.5122 .0682 .0682 .0684 Add: 18497.5093 .0871 .0870 .0872 Triad: 19043.7449 .0853 .0845 .0867 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --0__=08BBFA1CDFFBCD4D8f9e8a93df938690918c08BBFA1CDFFBCD4D Content-type: text/html; charset=US-ASCII Content-Disposition: inline




These are standard STREAM results on an IBM System p5 550
with four 1.9 GHz cpus. This is a POWER5+ SMP machine.
Large pages were used in all cases.

Function Rate (MB/s) Avg time Min time Max time
Copy: 16074.3482 .0669 .0668 .0672
Scale: 15747.5122 .0682 .0682 .0684
Add: 18497.5093 .0871 .0870 .0872
Triad: 19043.7449 .0853 .0845 .0867

Here is the full output file:
-----------------------------

Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 2
CPU binding list : 0 2
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158802366464
Shared Segment Pointer = 504403159339237376
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 536870912 (MB = 512 )
Array Size (DW) = 67108864
Num_threads = 4
Num_threads = 4
Num_threads = 4
Num_threads = 4
rebind: num_parthds is 4
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000

Incremental Offset = 1536
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67075072
The total memory requirement is 1535 MB
You are running each test 5 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 16074.3482 .0669 .0668 .0672
Scale: 15747.5122 .0682 .0682 .0684
Add: 18497.5093 .0871 .0870 .0872
Triad: 19043.7449 .0853 .0845 .0867
----------------------------------------------------
Solution Validates!
----------------------------------------------------

______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com --0__=08BBFA1CDFFBCD4D8f9e8a93df938690918c08BBFA1CDFFBCD4D-- From - Mon Oct 3 17:18:46 2005 X-Account-Key: account2 X-UIDL: 95696-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-03 (ms-mta-03-smtp.texas.rr.com [10.93.38.33]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00CHVSQF9R@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 03 Oct 2005 13:59:03 -0500 (CDT) Received: from clmboh-mx-07.mgw.rr.com (clmboh-mx-07.mgw.rr.com [65.24.7.61]) by ms-mta-03.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00KF3SQF2C@ms-mta-03.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 03 Oct 2005 13:59:03 -0500 (CDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by clmboh-mx-07.mgw.rr.com with SMTP; Mon, 03 Oct 2005 14:59:04 -0400 Received: (qmail 28369 invoked by uid 10000); Mon, 03 Oct 2005 19:01:48 +0000 Received: (qmail 28355 invoked from network); Mon, 03 Oct 2005 19:01:48 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Mon, 03 Oct 2005 19:01:48 +0000 Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id AA5E6D038271 for ; Mon, 03 Oct 2005 13:58:59 -0500 (CDT) Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id j93Iw7si020439 for ; Mon, 03 Oct 2005 14:58:07 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j93IwxtP526464 for ; Mon, 03 Oct 2005 12:58:59 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j93IwwmQ027373 for ; Mon, 03 Oct 2005 12:58:58 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av01.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j93IwwXA027361; Mon, 03 Oct 2005 12:58:58 -0600 Date: Mon, 03 Oct 2005 12:58:57 -0600 From: Ly Vu Subject: tuned STREAM on IBM System p5 550 (4CPUs 1.9 GHz) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 Content-type: multipart/alternative; Boundary="0__=08BBFA1CDFFBA58C8f9e8a93df938690918c08BBFA1CDFFBA58C" Content-disposition: inline Importance: Normal X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF654 | July 22, 2005) at 10/03/2005 12:58:58 Sensitivity: Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 13797 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=08BBFA1CDFFBA58C8f9e8a93df938690918c08BBFA1CDFFBA58C Content-type: text/plain; charset=US-ASCII These are tuned STREAM results on an IBM System p5 550 with four 1.9 GHz cpus. This is a POWER5+ SMP machine. Large pages were used in all cases. Function Rate (MB/s) RMS time Min time Max time Copy: 16757.95 .07 .06 .07 Scale: 16339.98 .07 .07 .07 Add: 20061.16 .08 .08 .08 Triad: 20403.48 .08 .08 .08 Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array = 2 CPU binding list : 0 2 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403158802366464 Shared Segment Pointer = 504403159339237376 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 536870912 (MB = 512 ) Array Size (DW) = 67108864 Num_threads = 4 Num_threads = 4 Num_threads = 4 Num_threads = 4 rebind: num_parthds is 4 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 2560 Number of Threads = 4 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 67077120 Offset = 0 The total memory requirement is 1535 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 65374 microseconds (= 65374 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 16757.95 .07 .06 .07 Scale: 16339.98 .07 .07 .07 Add: 20061.16 .08 .08 .08 Triad: 20403.48 .08 .08 .08 Sum of a is = 101873376000000.000 Sum of b is = 20374675200000.0000 Sum of c is = 27166233600000.0000 ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --0__=08BBFA1CDFFBA58C8f9e8a93df938690918c08BBFA1CDFFBA58C Content-type: text/html; charset=US-ASCII Content-Disposition: inline



These are tuned STREAM results on an IBM System p5 550
with four 1.9 GHz cpus. This is a POWER5+ SMP machine.
Large pages were used in all cases.


Function Rate (MB/s) RMS time Min time Max time
Copy: 16757.95 .07 .06 .07
Scale: 16339.98 .07 .07 .07
Add: 20061.16 .08 .08 .08
Triad: 20403.48 .08 .08 .08


Here is the full output file:
-----------------------------

Requesting Large Pages
Setting up for 2 CPUs per module
Number of segments per array = 2
CPU binding list : 0 2
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158802366464
Shared Segment Pointer = 504403159339237376
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 536870912 (MB = 512 )
Array Size (DW) = 67108864
Num_threads = 4
Num_threads = 4
Num_threads = 4
Num_threads = 4
rebind: num_parthds is 4
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000

Incremental Offset = 2560
Number of Threads = 4
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 67077120
Offset = 0
The total memory requirement is 1535 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 65374 microseconds
(= 65374 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 16757.95 .07 .06 .07
Scale: 16339.98 .07 .07 .07
Add: 20061.16 .08 .08 .08
Triad: 20403.48 .08 .08 .08
Sum of a is = 101873376000000.000
Sum of b is = 20374675200000.0000
Sum of c is = 27166233600000.0000



______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com --0__=08BBFA1CDFFBA58C8f9e8a93df938690918c08BBFA1CDFFBA58C-- From - Mon Oct 3 17:18:47 2005 X-Account-Key: account2 X-UIDL: 95702-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-01 (ms-mta-01-smtp [10.93.38.11]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00CLCSY49R@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 03 Oct 2005 14:03:40 -0500 (CDT) Received: from clmboh-mx-05.mgw.rr.com (clmboh-mx-05.mgw.rr.com [65.24.7.19]) by ms-mta-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS005H9SY06E@ms-mta-01.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 03 Oct 2005 15:03:40 -0400 (EDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by clmboh-mx-05.mgw.rr.com with SMTP; Mon, 03 Oct 2005 15:03:40 -0400 Received: (qmail 23837 invoked by uid 10000); Mon, 03 Oct 2005 19:06:25 +0000 Received: (qmail 23786 invoked from network); Mon, 03 Oct 2005 19:06:25 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Mon, 03 Oct 2005 19:06:25 +0000 Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id 7DF6CD03B035 for ; Mon, 03 Oct 2005 14:03:33 -0500 (CDT) Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id j93J2e0P023761 for ; Mon, 03 Oct 2005 15:02:40 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j93J3WtP524096 for ; Mon, 03 Oct 2005 13:03:32 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j93J3WuE021526 for ; Mon, 03 Oct 2005 13:03:32 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av02.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j93J3WT2021516; Mon, 03 Oct 2005 13:03:32 -0600 Date: Mon, 03 Oct 2005 13:03:30 -0600 From: Ly Vu Subject: standard STREAM on IBM System p5 550Q (8CPUs 1.5 GHz) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 Content-type: multipart/alternative; Boundary="0__=08BBFA1CDFFB13438f9e8a93df938690918c08BBFA1CDFFB1343" Content-disposition: inline Importance: Normal X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF654 | July 22, 2005) at 10/03/2005 13:03:31 Sensitivity: Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 13803 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=08BBFA1CDFFB13438f9e8a93df938690918c08BBFA1CDFFB1343 Content-type: text/plain; charset=US-ASCII These are standard STREAM results on an IBM System p5 550Q with eight 1.5 GHz cpus. This is a POWER5+ SMP machine. Large pages were used in all cases. Function Rate (MB/s) Avg time Min time Max time Copy: 15717.9163 .0682 .0682 .0682 Scale: 15282.5470 .0702 .0701 .0702 Add: 16917.2731 .0950 .0950 .0951 Triad: 17330.8483 .0928 .0928 .0928 Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 8 CPUs per module Number of segments per array = 2 CPU binding list : 0 8 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403158802366464 Shared Segment Pointer = 504403159339237376 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 536870912 (MB = 512 ) Array Size (DW) = 67108864 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 rebind: num_parthds is 16 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 512 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 66980864 The total memory requirement is 1533 MB You are running each test 5 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 15717.9163 .0682 .0682 .0682 Scale: 15282.5470 .0702 .0701 .0702 Add: 16917.2731 .0950 .0950 .0951 Triad: 17330.8483 .0928 .0928 .0928 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --0__=08BBFA1CDFFB13438f9e8a93df938690918c08BBFA1CDFFB1343 Content-type: text/html; charset=US-ASCII Content-Disposition: inline



These are standard STREAM results on an IBM System p5 550Q
with eight 1.5 GHz cpus. This is a POWER5+ SMP machine.
Large pages were used in all cases.

Function Rate (MB/s) Avg time Min time Max time
Copy: 15717.9163 .0682 .0682 .0682
Scale: 15282.5470 .0702 .0701 .0702
Add: 16917.2731 .0950 .0950 .0951
Triad: 17330.8483 .0928 .0928 .0928

Here is the full output file:
-----------------------------

Requesting Large Pages
Setting up for 8 CPUs per module
Number of segments per array = 2
CPU binding list : 0 8
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158802366464
Shared Segment Pointer = 504403159339237376
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 536870912 (MB = 512 )
Array Size (DW) = 67108864
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
rebind: num_parthds is 16
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000

Incremental Offset = 512
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 66980864
The total memory requirement is 1533 MB
You are running each test 5 times
--
The *best* time for each test is used
*EXCLUDING* the first and last iterations
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 15717.9163 .0682 .0682 .0682
Scale: 15282.5470 .0702 .0701 .0702
Add: 16917.2731 .0950 .0950 .0951
Triad: 17330.8483 .0928 .0928 .0928
----------------------------------------------------
Solution Validates!
----------------------------------------------------


______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com --0__=08BBFA1CDFFB13438f9e8a93df938690918c08BBFA1CDFFB1343-- From - Mon Oct 3 17:18:47 2005 X-Account-Key: account2 X-UIDL: 95700-1059743608 X-Mozilla-Status: 0005 X-Mozilla-Status2: 04000000 Return-path: Received: from ms-mta-02 (ms-mta-02-smtp [10.93.38.12]) by ms-mss-01.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00CDESVN9R@ms-mss-01.texas.rr.com> for jmccalpin@austin.rr.com; Mon, 03 Oct 2005 14:02:12 -0500 (CDT) Received: from orngca-mx-10.mgw.rr.com (orngca-mx-10.mgw.rr.com [66.75.160.144]) by ms-mta-02.texas.rr.com (iPlanet Messaging Server 5.2 HotFix 2.04 (built Feb 8 2005)) with ESMTP id <0INS00K40SVKVD@ms-mta-02.texas.rr.com> for jmccalpin@austin.rr.com (ORCPT jmccalpin@austin.rr.com); Mon, 03 Oct 2005 15:02:11 -0400 (EDT) Received: from fife-smtp.catalog.com (HELO fife.catalog.com) (209.217.36.155) by orngca-mx-10.mgw.rr.com with SMTP; Mon, 03 Oct 2005 15:02:11 -0400 Received: (qmail 27545 invoked by uid 10000); Mon, 03 Oct 2005 19:04:56 +0000 Received: (qmail 27430 invoked from network); Mon, 03 Oct 2005 19:04:56 +0000 Received: from unknown (HELO spamfilter.catalog.com) (209.217.36.24) by smtp3.catalog.com with SMTP; Mon, 03 Oct 2005 19:04:56 +0000 Received: from e32.co.us.ibm.com (e32.co.us.ibm.com [32.97.110.150]) by spamfilter.catalog.com (Spam Firewall) with ESMTP id D5CCCD03B035 for ; Mon, 03 Oct 2005 14:02:06 -0500 (CDT) Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.17.195.11]) by e32.co.us.ibm.com (8.12.11/8.12.11) with ESMTP id j93J1ERj022517 for ; Mon, 03 Oct 2005 15:01:14 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by westrelay02.boulder.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id j93J26tP472752 for ; Mon, 03 Oct 2005 13:02:06 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11/8.13.3) with ESMTP id j93J25Ma002770 for ; Mon, 03 Oct 2005 13:02:06 -0600 Received: from d03nm116.boulder.ibm.com (d03nm116.boulder.ibm.com [9.17.195.142]) by d03av03.boulder.ibm.com (8.12.11/8.12.11) with ESMTP id j93J25Gh002767; Mon, 03 Oct 2005 13:02:05 -0600 Date: Mon, 03 Oct 2005 13:02:04 -0600 From: Ly Vu Subject: tuned STREAM on IBM System p5 550Q (8CPUs 1.5 GHz) To: mccalpin@cs.virginia.edu Cc: john@mccalpin.com, jacobt@us.ibm.com Message-id: MIME-version: 1.0 Content-type: multipart/alternative; Boundary="0__=08BBFA1CDFFBF2668f9e8a93df938690918c08BBFA1CDFFBF266" Content-disposition: inline Importance: Normal X-MIMETrack: Serialize by Router on D03NM116/03/M/IBM(Release 6.53HF654 | July 22, 2005) at 10/03/2005 13:02:05 Sensitivity: Delivered-to: mccalpin.com-john@mccalpin.com X-Virus-Scanned: by Catalog.com Spam Firewall at catalog.com Original-recipient: rfc822;jmccalpin@austin.rr.com X-NAS-Classification: 0 X-NAS-MessageID: 13801 X-NAS-Validation: {7D898514-4303-4FE9-94E2-60980EE63DAC} --0__=08BBFA1CDFFBF2668f9e8a93df938690918c08BBFA1CDFFBF266 Content-type: text/plain; charset=US-ASCII These are tuned STREAM results on an IBM System p5 550Q with eight 1.5 GHz cpus. This is a POWER5+ SMP machine. Large pages were used in all cases. Function Rate (MB/s) RMS time Min time Max time Copy: 16678.73 .07 .06 .08 Scale: 16176.72 .07 .07 .07 Add: 18145.39 .09 .09 .09 Triad: 18755.78 .09 .09 .09 Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 8 CPUs per module Number of segments per array = 2 CPU binding list : 0 8 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403158802366464 Shared Segment Pointer = 504403159339237376 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 536870912 (MB = 512 ) Array Size (DW) = 67108864 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 Num_threads = 16 rebind: num_parthds is 16 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 512 Number of Threads = 16 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 66976768 Offset = 0 The total memory requirement is 1532 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 65764 microseconds (= 65764 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 16678.73 .07 .06 .08 Scale: 16176.72 .07 .07 .07 Add: 18145.39 .09 .09 .09 Triad: 18755.78 .09 .09 .09 Sum of a is = 101720966400000.000 Sum of b is = 20344193280000.0000 Sum of c is = 27125591040000.0000 ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com --0__=08BBFA1CDFFBF2668f9e8a93df938690918c08BBFA1CDFFBF266 Content-type: text/html; charset=US-ASCII Content-Disposition: inline



These are tuned STREAM results on an IBM System p5 550Q
with eight 1.5 GHz cpus. This is a POWER5+ SMP machine.
Large pages were used in all cases.


Function Rate (MB/s) RMS time Min time Max time
Copy: 16678.73 .07 .06 .08
Scale: 16176.72 .07 .07 .07
Add: 18145.39 .09 .09 .09
Triad: 18755.78 .09 .09 .09


Here is the full output file:
-----------------------------

Requesting Large Pages
Setting up for 8 CPUs per module
Number of segments per array = 2
CPU binding list : 0 8
Shared Segment Pointer = 504403158265495552
Shared Segment Pointer = 504403158802366464
Shared Segment Pointer = 504403159339237376
Segment Size (B) = 268435456 (MB = 256 )
Array Size (B) = 536870912 (MB = 512 )
Array Size (DW) = 67108864
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
Num_threads = 16
rebind: num_parthds is 16
Starting Initialization
Done With Initialization
a(1) 1.00000000000000000
b(M) 1.00000000000000000
c(M) 1.00000000000000000

Incremental Offset = 512
Number of Threads = 16
----------------------------------------------
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
Array size = 66976768
Offset = 0
The total memory requirement is 1532 MB
You are running each test 5 times
The *best* time for each test is used
----------------------------------------------------
Your clock granularity appears to be less than one microsecond
Your clock granularity/precision appears to be 1 microseconds
The tests below will each take a time on the order
of 65764 microseconds
(= 65764 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
----------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 16678.73 .07 .06 .08
Scale: 16176.72 .07 .07 .07
Add: 18145.39 .09 .09 .09
Triad: 18755.78 .09 .09 .09
Sum of a is = 101720966400000.000
Sum of b is = 20344193280000.0000
Sum of c is = 27125591040000.0000



______________________________________________
Ly Vu
IBM Corp. - Austin, Texas.
RS/6000 Performance Analysis.

Phone : (512) 838-8228
Email : lyvu@us.ibm.com --0__=08BBFA1CDFFBF2668f9e8a93df938690918c08BBFA1CDFFBF266--