From - Fri Apr 25 13:58:08 2014
X-Account-Key: account3
X-UIDL: AExpimIAABJiU1k10QAAAFIf0nM
X-Mozilla-Status: 0005
X-Mozilla-Status2: 00000000
X-Mozilla-Keys: $label2                                                                         
X-Apparently-To: john.mccalpin@att.net via 98.138.197.219; Thu, 24 Apr 2014 16:03:28 +0000
Received-SPF: softfail (transitioning domain of oracle.com does not designate 128.143.137.19 as permitted sender)
X-YMailISG: 0XgXYCoWLDu_eBFnTj_2I0Q0IuQsjXC94bnQbJVZZm3.niSu
 CtRoNrMqLvzqyGj21QmXQHre02CTAGN9cLmzfEjyoLHgEwyVUF.SlSC1LvlQ
 oA214LQCnCyqR0a2HdFGWqL0uMr9eWyjpmz5iu9RmlsIORSY1INy7d0tZxY9
 upeRmVJ2WDLEKurFLTh_jXUytQmeq_S4A_jcj38..eePz67MSS3tuRqw0b2d
 lcyz_NRKDve75hToWEo0S2EB4YZyBBsi1.7yh8TS5olsfkWxewVmx.2I0gAk
 4F8ngWz.pl7kF8hTm7oiG65g4j340a.qD4M8znpavbC1zddLh4TQjY.BCwhL
 _KREK4WkKZxhNYA1LNnso2WWmDGRW3fei2g.P0FdMGOSW3xTQOt3tv6DPvUV
 Cgof2c71CvjF9exHF1rTlMJmkp5yG2OIJkmtOqbYyijUI3A.msgcstqVjdYh
 F9Exga0IAbk1ZGVwV.o0Mde.loSawDP.XAQpxJ3Y6bNUKQiNZkowk1tMS3Ih
 emsxrwak8ZgcmLjx1X11tAhdCWIsEZTR8Irr808IdUsFkX4Tt1jTMkHY6d..
 iEe_h_06ksNoB3MD7sEqa9_ViWfMSaPFe.UC355Y9EjSJ7WmujynQxCQqOtB
 ntTC.Cg_RRGhl8xWQC.wCJtOiGEx0NaVe0OCNOhRR_b89EP5RQwjhSSfmeow
 hGxJM4Qm2m8MDWtASFAWmC6Fx5SDu4RHPVuvnq3JKxVF3CLj6lqcPBboBKHW
 aEyGx.bimkwxpyM9Ht3bWUUeEbR.z_g8WGfd.DU_IwmxkS3IBZmXRayYfWZ4
 YWiFzJkkVqseaKS.yQwnqrI5AEuImrF3nxOZf02fwAT8yhDhiu3840J6uG7L
 vIuVbTVy9lE45ccigaG5arqU_blwj15P2PX7T0OL_QpSD3dQjdg_Di10z06T
 Q0MwG1RmfvJfFV5x57cpuVycZcLyXMR0qZ9QW3knL8XnGmerdmqOge8N3bVB
 UO0fJKA1qXLKgVpYe9qaY8PpLNdmOuZ9RSxyo22eSVGEKgM8KdCwrnBFBxGb
 BfOszo23Ujo8VQqa7LWJfJaDYwFVbPjgCk2OJI9VQtGvfCS0SYPM9fvfBs_.
 kJnSuttORK1M5EXymSfVeaN_Iq0zhMA0YWl92OGp.wHS71iw.kU_awqxjaBH
 5DPX3zCSSbk8979efgZp5yjPY5RSuxbuSP8blz6p2ElIdDg-
X-Originating-IP: [128.143.137.19]
Authentication-Results: mta1058.sbc.mail.bf1.yahoo.com  from=oracle.com; domainkeys=neutral (no sig);  from=oracle.com; dkim=neutral (no sig)
Received: from 12.102.252.82  (EHLO alnwmxc02.att.net) (12.102.252.82)
  by mta1058.sbc.mail.bf1.yahoo.com with SMTP; Thu, 24 Apr 2014 16:03:27 +0000
Received: from ares.cs.virginia.edu ([128.143.137.19])
          by att.net (alnwmxc02) with ESMTP
          id <20140424160327a0200cmurje>; Thu, 24 Apr 2014 16:03:27 +0000
X-Originating-IP: [128.143.137.19]
Received: from aserp1050.oracle.com (aserp1050.oracle.com [141.146.126.70])
	by ares.cs.Virginia.EDU (8.13.8/8.13.8/UVACS-2013030801) with ESMTP id s3OG3GKw016487
	for <mccalpin@cs.virginia.edu>; Thu, 24 Apr 2014 09:03:17 -0700 (PDT)
Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69])
	by aserp1050.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s3OG3BQa007927
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <mccalpin@cs.virginia.edu>; Thu, 24 Apr 2014 16:03:11 GMT
Received: from ucsinet22.oracle.com (ucsinet22.oracle.com [156.151.31.94])
	by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s3OG345U011700
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <mccalpin@cs.virginia.edu>; Thu, 24 Apr 2014 16:03:05 GMT
Received: from aserz7022.oracle.com (aserz7022.oracle.com [141.146.126.231])
	by ucsinet22.oracle.com (8.14.5+Sun/8.14.5) with ESMTP id s3OG33Mr010096
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO)
	for <mccalpin@cs.virginia.edu>; Thu, 24 Apr 2014 16:03:04 GMT
Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21])
	by aserz7022.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s3OG33sp003760
	for <mccalpin@cs.virginia.edu>; Thu, 24 Apr 2014 16:03:03 GMT
Received: from [10.137.232.40] (/10.137.232.40)
	by default (Oracle Beehive Gateway v4.0)
	with ESMTP ; Thu, 24 Apr 2014 09:03:03 -0700
Message-ID: <535935B0.5090108@oracle.com>
Date: Thu, 24 Apr 2014 09:02:56 -0700
From: Brian Whitney <brian.whitney@oracle.com>
Reply-To: brian.whitney@oracle.com
User-Agent: Mozilla/5.0 (X11; SunOS sun4u; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
To: mccalpin@cs.virginia.edu
CC: Brian Whitney <brian.whitney@oracle.com>
Subject: Oracle STREAM submission for Sun Server X4-4
References: <53580EB9.3040709@oracle.com>
In-Reply-To: <53580EB9.3040709@oracle.com>
X-Forwarded-Message-Id: <53580EB9.3040709@oracle.com>
Content-Type: multipart/alternative;
 boundary="------------020203010301050907050806"
X-Source-IP: aserp1040.oracle.com [141.146.126.69]
Received: from 98.138.197.219 (98.138.197.219) by deli10204.mail.ne1.yahoo.com(10.222.13.75); Thu, 24 Apr 2014 16:03:29 +0000

This is a multi-part message in MIME format.
--------------020203010301050907050806
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Dear Dr. McCalpin,

Please find below results for Oracle's Sun Server X4-4 system.
Please include them in your list of results.

Thank you,

Brian Whitney
Oracle

=====
Model Name: Sun Server X4-4
CPU Name: Intel Xeon E7-8895 v2 (2.8 GHz)
CPU Characteristics: Intel Turbo Boost Technology up to 3.60 GHz
CPU(s) enabled: 60 cores, 4 chips, 15 cores/chip, 2 threads/core
Primary Cache: 32 KB I + 32 KB D on chip per core
Secondary Cache: 256 KB I+D on chip per core
L3 Cache: 37.5 MB I+D on chip per chip
Memory: 1 TB (64 x 16 GB 2Rx4 PC3L-12800R-11, ECC)
Operating System: Oracle Solaris 11.1.12.5.0
Compiler: Oracle Solaris Studio dev bld cia_20140408
Compiler Flags: -m64 -fast -xopenmp -xmodel=medium -xpagesize=2M -xprefetch=no 
-xtarget=nehalem
-Wu,-nontemporal -DNTIMES=40 -DSTREAM_ARRAY_SIZE=1000000000l
Environment Settings: OMP_NUM_THREADS=60
                                         SUNW_MP_PROCBIND=true
                                         SUNW_MP_THR_IDLE=spin
BIOS:  Release 24.02.01.00  (default settings)
====

-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 1000000000 (elements), Offset = 0 (elements)
Memory per array = 7629.4 MiB (= 7.5 GiB).
Total memory required = 22888.2 MiB (= 22.4 GiB).
Each kernel will be executed 40 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 60
Number of Threads counted = 60
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 69843 microseconds.
    (= 69843 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          221370.3     0.073119     0.072277     0.081731
Scale:         221944.3     0.073072     0.072090     0.080903
Add:           244588.4     0.099979     0.098124     0.112186
Triad:         245067.8     0.099303     0.097932     0.112634
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------





--------------020203010301050907050806
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 8bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Dear Dr. McCalpin,<br>
    <br>
    Please find below results for Oracle's Sun Server X4-4 system.<br>
    Please include them in your list of results.<br>
    <br>
    Thank you,<br>
    <br>
    Brian Whitney<br>
    Oracle<br>
    <div class="moz-forward-container"> <br>
      =====<br>
      Model Name: Sun Server X4-4 <br>
      CPU Name: Intel Xeon E7-8895 v2 (2.8 GHz)<br>
      CPU Characteristics: Intel Turbo Boost Technology up to 3.60 GHz <br>
      CPU(s) enabled: 60 cores, 4 chips, 15 cores/chip, 2 threads/core <br>
      Primary Cache: 32 KB I + 32 KB D on chip per core <br>
      Secondary Cache: 256 KB I+D on chip per core <br>
      L3 Cache: 37.5 MB I+D on chip per chip <br>
      Memory: 1 TB (64 x 16 GB 2Rx4 PC3L-12800R-11, ECC)<br>
      Operating System: Oracle Solaris 11.1.12.5.0<br>
      Compiler: Oracle Solaris Studio dev bld cia_20140408 <br>
      Compiler Flags: -m64 -fast -xopenmp -xmodel=medium -xpagesize=2M
      -xprefetch=no -xtarget=nehalem <br>
      -Wu,-nontemporal -DNTIMES=40 -DSTREAM_ARRAY_SIZE=1000000000l <br>
      Environment Settings: OMP_NUM_THREADS=60 <br>
                                              SUNW_MP_PROCBIND=true <br>
                                              SUNW_MP_THR_IDLE=spin <br>
      BIOS:  Release 24.02.01.00  (default settings)<br>
      ====<br>
      <br>
      ------------------------------------------------------------- <br>
      STREAM version $Revision: 5.10 $ <br>
      ------------------------------------------------------------- <br>
      This system uses 8 bytes per array element. <br>
      ------------------------------------------------------------- <br>
      Array size = 1000000000 (elements), Offset = 0 (elements) <br>
      Memory per array = 7629.4 MiB (= 7.5 GiB). <br>
      Total memory required = 22888.2 MiB (= 22.4 GiB). <br>
      Each kernel will be executed 40 times. <br>
       The <b class="moz-txt-star"><span class="moz-txt-tag">*</span>best<span
          class="moz-txt-tag">*</span></b> time for each kernel
      (excluding the first iteration) <br>
       will be used to compute the reported bandwidth. <br>
      ------------------------------------------------------------- <br>
      Number of Threads requested = 60 <br>
      Number of Threads counted = 60 <br>
      ------------------------------------------------------------- <br>
      Your clock granularity/precision appears to be 1 microseconds. <br>
      Each test below will take on the order of 69843 microseconds. <br>
         (= 69843 clock ticks) <br>
      Increase the size of the arrays if this shows that <br>
      you are not getting at least 20 clock ticks per test. <br>
      ------------------------------------------------------------- <br>
      WARNING -- The above is only a rough guideline. <br>
      For best results, please be sure you know the <br>
      precision of your system timer. <br>
      ------------------------------------------------------------- <br>
      Function    Best Rate MB/s  Avg time     Min time     Max time <br>
      Copy:          221370.3     0.073119     0.072277     0.081731 <br>
      Scale:         221944.3     0.073072     0.072090     0.080903 <br>
      Add:           244588.4     0.099979     0.098124     0.112186 <br>
      Triad:         245067.8     0.099303     0.097932     0.112634 <br>
      ------------------------------------------------------------- <br>
      Solution Validates: avg error less than 1.000000e-13 on all three
      arrays <br>
      ------------------------------------------------------------- <br>
      <br>
      <br>
      <br>
    </div>
    <br>
  </body>
</html>

--------------020203010301050907050806--

From - Tue Jan 26 11:40:17 2016
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00800000
X-Mozilla-Keys:                                                                                 
Reply-To: john@mccalpin.com
To: John McCalpin <john@mccalpin.com>
From: "John D. McCalpin, Ph.D." <john.mccalpin@att.net>
Subject: STREAM results from HP Moonshot M300
Message-ID: <56A7AF7E.7060300@att.net>
Date: Tue, 26 Jan 2016 11:40:14 -0600
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0)
 Gecko/20100101 Thunderbird/38.5.1
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="------------090200060800060207020204"

This is a multi-part message in MIME format.
--------------090200060800060207020204
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

These tests were run on 2014-06-19, but were never submitted.

The STREAM code was compiled with the Intel 14 compiler targeting the 
c2750 (Silvermont) Atom processor.
The best results were obtained using 4 cores and KMP_AFFINITY=scatter so 
that I used one core of each pair.
Performance degraded slightly when using all 8 cores.  I did not attempt 
to determine whether this was due to contention for the shared L2 caches 
or contention at the DRAMs.

The results included here are the median of three tests run on an idle 
system, with KMP_AFFINITY used to prevent thread migration.
Turbo mode was enabled, and the results were about 1% higher than 
results with the frequency pinned to the nominal value of 2.4 GHz.
==================================================================
Single-thread results:
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 
11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 
{0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 1 threads/core 
(8 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 5
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 6
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 7
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 1
Number of Threads counted = 1
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 22813 microseconds.
    (= 22813 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:        7056.2919       0.0227       0.0227       0.0229
Scale:       6920.3659       0.0232       0.0231       0.0234
Add:         7054.2892       0.0340       0.0340       0.0341
Triad:       6935.5998       0.0346       0.0346       0.0347
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

==================================================================
Four-thread results:
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 
11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 
{0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 1 threads/core 
(8 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 5
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 6
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 7
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6}
-------------------------------------------------------------
STREAM version $Revision: 1.5 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 4
Number of Threads counted = 4
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 13098 microseconds.
    (= 13098 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:       14891.9013       0.0108       0.0107       0.0110
Scale:      14813.6647       0.0108       0.0108       0.0109
Add:        15313.9665       0.0157       0.0157       0.0159
Triad:      15487.8523       0.0156       0.0155       0.0156
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------

==================================================================
Eight-thread results:
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 
11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 
{0,1,2,3,4,5,6,7}
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 1 threads/core 
(8 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 5
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 6
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 7
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {6}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {3}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {5}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {7}
-------------------------------------------------------------
STREAM version $Revision: 1.5 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
  The *best* time for each kernel (excluding the first iteration)
  will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 12964 microseconds.
    (= 12964 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:       13684.5155       0.0118       0.0117       0.0118
Scale:      13699.8804       0.0117       0.0117       0.0117
Add:        14356.6798       0.0168       0.0167       0.0168
Triad:      14430.7724       0.0167       0.0166       0.0167
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------


--------------090200060800060207020204
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: 8bit

<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=utf-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    These tests were run on 2014-06-19, but were never submitted.<br>
    <br>
    The STREAM code was compiled with the Intel 14 compiler targeting
    the c2750 (Silvermont) Atom processor.<br>
    The best results were obtained using 4 cores and
    KMP_AFFINITY=scatter so that I used one core of each pair.<br>
    Performance degraded slightly when using all 8 cores.  I did not
    attempt to determine whether this was due to contention for the
    shared L2 caches or contention at the DRAMs.<br>
    <br>
    The results included here are the median of three tests run on an
    idle system, with KMP_AFFINITY used to prevent thread migration.<br>
    Turbo mode was enabled, and the results were about 1% higher than
    results with the frequency pinned to the nominal value of 2.4 GHz.<br>
    ==================================================================<br>
    Single-thread results:<br>
    OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.<br>
    OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid
    leaf 11 info<br>
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected:
    {0,1,2,3,4,5,6,7}<br>
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs<br>
    OMP: Info #157: KMP_AFFINITY: Uniform topology<br>
    OMP: Info #179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 1
    threads/core (8 total cores)<br>
    OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:<br>
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 5 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 6 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 7 <br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set
    {0}<br>
    -------------------------------------------------------------<br>
    STREAM version $Revision: 5.10 $<br>
    -------------------------------------------------------------<br>
    This system uses 8 bytes per array element.<br>
    -------------------------------------------------------------<br>
    Array size = 10000000 (elements), Offset = 0 (elements)<br>
    Memory per array = 76.3 MiB (= 0.1 GiB).<br>
    Total memory required = 228.9 MiB (= 0.2 GiB).<br>
    Each kernel will be executed 10 times.<br>
     The *best* time for each kernel (excluding the first iteration)<br>
     will be used to compute the reported bandwidth.<br>
    -------------------------------------------------------------<br>
    Number of Threads requested = 1<br>
    Number of Threads counted = 1<br>
    -------------------------------------------------------------<br>
    Your clock granularity/precision appears to be 1 microseconds.<br>
    Each test below will take on the order of 22813 microseconds.<br>
       (= 22813 clock ticks)<br>
    Increase the size of the arrays if this shows that<br>
    you are not getting at least 20 clock ticks per test.<br>
    -------------------------------------------------------------<br>
    WARNING -- The above is only a rough guideline.<br>
    For best results, please be sure you know the<br>
    precision of your system timer.<br>
    -------------------------------------------------------------<br>
    Function    Best Rate MB/s  Avg time     Min time     Max time<br>
    Copy:        7056.2919       0.0227       0.0227       0.0229<br>
    Scale:       6920.3659       0.0232       0.0231       0.0234<br>
    Add:         7054.2892       0.0340       0.0340       0.0341<br>
    Triad:       6935.5998       0.0346       0.0346       0.0347<br>
    -------------------------------------------------------------<br>
    Solution Validates: avg error less than 1.000000e-13 on all three
    arrays<br>
    -------------------------------------------------------------<br>
    <br>
    ==================================================================<br>
    Four-thread results:<br>
    OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.<br>
    OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid
    leaf 11 info<br>
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected:
    {0,1,2,3,4,5,6,7}<br>
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs<br>
    OMP: Info #157: KMP_AFFINITY: Uniform topology<br>
    OMP: Info #179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 1
    threads/core (8 total cores)<br>
    OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:<br>
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 5 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 6 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 7 <br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set
    {0}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set
    {2}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set
    {4}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set
    {6}<br>
    -------------------------------------------------------------<br>
    STREAM version $Revision: 1.5 $<br>
    -------------------------------------------------------------<br>
    This system uses 8 bytes per array element.<br>
    -------------------------------------------------------------<br>
    Array size = 10000000 (elements), Offset = 0 (elements)<br>
    Memory per array = 76.3 MiB (= 0.1 GiB).<br>
    Total memory required = 228.9 MiB (= 0.2 GiB).<br>
    Each kernel will be executed 10 times.<br>
     The *best* time for each kernel (excluding the first iteration)<br>
     will be used to compute the reported bandwidth.<br>
    -------------------------------------------------------------<br>
    Number of Threads requested = 4<br>
    Number of Threads counted = 4<br>
    -------------------------------------------------------------<br>
    Your clock granularity/precision appears to be 1 microseconds.<br>
    Each test below will take on the order of 13098 microseconds.<br>
       (= 13098 clock ticks)<br>
    Increase the size of the arrays if this shows that<br>
    you are not getting at least 20 clock ticks per test.<br>
    -------------------------------------------------------------<br>
    WARNING -- The above is only a rough guideline.<br>
    For best results, please be sure you know the<br>
    precision of your system timer.<br>
    -------------------------------------------------------------<br>
    Function    Best Rate MB/s  Avg time     Min time     Max time<br>
    Copy:       14891.9013       0.0108       0.0107       0.0110<br>
    Scale:      14813.6647       0.0108       0.0108       0.0109<br>
    Add:        15313.9665       0.0157       0.0157       0.0159<br>
    Triad:      15487.8523       0.0156       0.0155       0.0156<br>
    -------------------------------------------------------------<br>
    Solution Validates: avg error less than 1.000000e-13 on all three
    arrays<br>
    -------------------------------------------------------------<br>
    <br>
    ==================================================================<br>
    Eight-thread results:<br>
    OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.<br>
    OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid
    leaf 11 info<br>
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected:
    {0,1,2,3,4,5,6,7}<br>
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs<br>
    OMP: Info #157: KMP_AFFINITY: Uniform topology<br>
    OMP: Info #179: KMP_AFFINITY: 1 packages x 8 cores/pkg x 1
    threads/core (8 total cores)<br>
    OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:<br>
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 5 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 6 <br>
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 7 <br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set
    {0}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set
    {2}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set
    {4}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set
    {6}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set
    {1}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set
    {3}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set
    {5}<br>
    OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set
    {7}<br>
    -------------------------------------------------------------<br>
    STREAM version $Revision: 1.5 $<br>
    -------------------------------------------------------------<br>
    This system uses 8 bytes per array element.<br>
    -------------------------------------------------------------<br>
    Array size = 10000000 (elements), Offset = 0 (elements)<br>
    Memory per array = 76.3 MiB (= 0.1 GiB).<br>
    Total memory required = 228.9 MiB (= 0.2 GiB).<br>
    Each kernel will be executed 10 times.<br>
     The *best* time for each kernel (excluding the first iteration)<br>
     will be used to compute the reported bandwidth.<br>
    -------------------------------------------------------------<br>
    Number of Threads requested = 8<br>
    Number of Threads counted = 8<br>
    -------------------------------------------------------------<br>
    Your clock granularity/precision appears to be 1 microseconds.<br>
    Each test below will take on the order of 12964 microseconds.<br>
       (= 12964 clock ticks)<br>
    Increase the size of the arrays if this shows that<br>
    you are not getting at least 20 clock ticks per test.<br>
    -------------------------------------------------------------<br>
    WARNING -- The above is only a rough guideline.<br>
    For best results, please be sure you know the<br>
    precision of your system timer.<br>
    -------------------------------------------------------------<br>
    Function    Best Rate MB/s  Avg time     Min time     Max time<br>
    Copy:       13684.5155       0.0118       0.0117       0.0118<br>
    Scale:      13699.8804       0.0117       0.0117       0.0117<br>
    Add:        14356.6798       0.0168       0.0167       0.0168<br>
    Triad:      14430.7724       0.0167       0.0166       0.0167<br>
    -------------------------------------------------------------<br>
    Solution Validates: avg error less than 1.000000e-13 on all three
    arrays<br>
    -------------------------------------------------------------<br>
    <br>
  </body>
</html>

--------------090200060800060207020204--
