FW: STREAM Results for HP ProLiant SL250s Xeon E5-2680 v2 (Ivy Bridge EP)

From: John McCalpin <mccalpin_at_tacc.utexas.edu>
Date: Thu, 5 Nov 2015 00:00:03 +0000

[STREAM Editor's Note: The summary below is correct, but I originally cut-n-pasted the wrong output file! This is now corrected.]
From: John McCalpin <mccalpin_at_tacc.utexas.edu<mailto:mccalpin_at_tacc.utexas.edu>>
Date: Tuesday, October 13, 2015 at 6:21 PM
To: John McCalpin <john_at_mccalpin.com<mailto:john_at_mccalpin.com>>
Subject: STREAM Results for HP ProLiant SL250s Xeon E5-2680 v2 (Ivy Bridge EP)

Run on a compute node of the TACC Maverick system.
HP ProLiant SL250s with 2 Xeon E5-2680 v2 processors (10 core, 2.8 GHz, Ivy Bridge EP) with 256 GiB of DDR3/1866 DRAM.

Compiled with icc 14.0.1: "icc -O3 -openmp -ffreestanding -DSTREAM_ARRAY_SIZE=80000000 -xSSE4.1 -opt-streaming-stores always"

Summary:
Copy 88087 MB/s
Scale 88416 MB/s
Add 90536 MB/s
Triad 90784 MB/s

Notes:

  * The best performance was obtained using 16 cores (8 per socket) and the SSE4.1 instruction set.
     * Using all 20 cores reduced the performance by a microscopic amount (much less than 1%).
     * Using AVX instructions reduced performance by 0.5% to 0.8%.
     * Both of these are repeatable, but not worth worrying about.
[...An incorrect output file was originally appended here, but I removed it and put the correct output below....]
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}
OMP: Info #156: KMP_AFFINITY: 20 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 10 cores/pkg x 1 threads/core (20 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 12
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 1 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 1 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 1 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 1 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 1 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 1 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 16 maps to package 1 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 17 maps to package 1 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 18 maps to package 1 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 19 maps to package 1 core 12
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {10}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {11}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {12}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {13}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {3}
OMP: Info #147: KMP_AFFINITY: Internal thread 8 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 9 bound to OS proc set {14}
OMP: Info #147: KMP_AFFINITY: Internal thread 10 bound to OS proc set {5}
OMP: Info #147: KMP_AFFINITY: Internal thread 11 bound to OS proc set {15}
OMP: Info #147: KMP_AFFINITY: Internal thread 12 bound to OS proc set {6}
OMP: Info #147: KMP_AFFINITY: Internal thread 13 bound to OS proc set {16}
OMP: Info #147: KMP_AFFINITY: Internal thread 14 bound to OS proc set {7}
OMP: Info #147: KMP_AFFINITY: Internal thread 15 bound to OS proc set {17}
OMP: Info #147: KMP_AFFINITY: Internal thread 16 bound to OS proc set {8}
OMP: Info #147: KMP_AFFINITY: Internal thread 17 bound to OS proc set {18}
OMP: Info #147: KMP_AFFINITY: Internal thread 18 bound to OS proc set {9}
OMP: Info #147: KMP_AFFINITY: Internal thread 19 bound to OS proc set {19}
-------------------------------------------------------------
STREAM version $Revision: 1.7 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 80000000 (elements), Offset = 0 (elements)
Memory per array = 610.4 MiB (= 0.6 GiB).
Total memory required = 1831.1 MiB (= 1.8 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 20
Number of Threads counted = 20
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 13602 microseconds.
   (= 13602 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 87929.4613 0.0146 0.0146 0.0147
Scale: 88354.9055 0.0145 0.0145 0.0146
Add: 90527.5996 0.0213 0.0212 0.0214
Triad: 90776.5906 0.0212 0.0212 0.0215
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}
OMP: Info #156: KMP_AFFINITY: 20 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 10 cores/pkg x 1 threads/core (20 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 12
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 1 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 1 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 1 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 1 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 1 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 1 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 16 maps to package 1 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 17 maps to package 1 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 18 maps to package 1 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 19 maps to package 1 core 12
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {10}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {11}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {12}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {13}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {3}
OMP: Info #147: KMP_AFFINITY: Internal thread 8 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 9 bound to OS proc set {14}
OMP: Info #147: KMP_AFFINITY: Internal thread 10 bound to OS proc set {5}
OMP: Info #147: KMP_AFFINITY: Internal thread 11 bound to OS proc set {15}
OMP: Info #147: KMP_AFFINITY: Internal thread 12 bound to OS proc set {6}
OMP: Info #147: KMP_AFFINITY: Internal thread 13 bound to OS proc set {16}
OMP: Info #147: KMP_AFFINITY: Internal thread 14 bound to OS proc set {7}
OMP: Info #147: KMP_AFFINITY: Internal thread 15 bound to OS proc set {17}
OMP: Info #147: KMP_AFFINITY: Internal thread 16 bound to OS proc set {8}
OMP: Info #147: KMP_AFFINITY: Internal thread 18 bound to OS proc set {9}
OMP: Info #147: KMP_AFFINITY: Internal thread 17 bound to OS proc set {18}
OMP: Info #147: KMP_AFFINITY: Internal thread 19 bound to OS proc set {19}
-------------------------------------------------------------
STREAM version $Revision: 1.7 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 80000000 (elements), Offset = 0 (elements)
Memory per array = 610.4 MiB (= 0.6 GiB).
Total memory required = 1831.1 MiB (= 1.8 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 20
Number of Threads counted = 20
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 13668 microseconds.
   (= 13668 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 87774.2029 0.0147 0.0146 0.0147
Scale: 88135.8821 0.0146 0.0145 0.0147
Add: 90446.2603 0.0213 0.0212 0.0213
Triad: 90801.1555 0.0213 0.0211 0.0215
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}
OMP: Info #156: KMP_AFFINITY: 20 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 10 cores/pkg x 1 threads/core (20 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 12
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 1 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 1 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 1 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 1 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 1 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 1 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 16 maps to package 1 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 17 maps to package 1 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 18 maps to package 1 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 19 maps to package 1 core 12
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}
OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {10}
OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {1}
OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {2}
OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {12}
OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {11}
OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {3}
OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {13}
OMP: Info #147: KMP_AFFINITY: Internal thread 8 bound to OS proc set {4}
OMP: Info #147: KMP_AFFINITY: Internal thread 9 bound to OS proc set {14}
OMP: Info #147: KMP_AFFINITY: Internal thread 10 bound to OS proc set {5}
OMP: Info #147: KMP_AFFINITY: Internal thread 11 bound to OS proc set {15}
OMP: Info #147: KMP_AFFINITY: Internal thread 12 bound to OS proc set {6}
OMP: Info #147: KMP_AFFINITY: Internal thread 13 bound to OS proc set {16}
OMP: Info #147: KMP_AFFINITY: Internal thread 14 bound to OS proc set {7}
OMP: Info #147: KMP_AFFINITY: Internal thread 15 bound to OS proc set {17}
OMP: Info #147: KMP_AFFINITY: Internal thread 16 bound to OS proc set {8}
OMP: Info #147: KMP_AFFINITY: Internal thread 17 bound to OS proc set {18}
OMP: Info #147: KMP_AFFINITY: Internal thread 18 bound to OS proc set {9}
OMP: Info #147: KMP_AFFINITY: Internal thread 19 bound to OS proc set {19}
-------------------------------------------------------------
STREAM version $Revision: 1.7 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 80000000 (elements), Offset = 0 (elements)
Memory per array = 610.4 MiB (= 0.6 GiB).
Total memory required = 1831.1 MiB (= 1.8 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 20
Number of Threads counted = 20
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 13776 microseconds.
   (= 13776 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 88086.7152 0.0146 0.0145 0.0146
Scale: 88416.0195 0.0146 0.0145 0.0146
Add: 90535.7416 0.0213 0.0212 0.0213
Triad: 90783.7540 0.0212 0.0211 0.0214
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
--
John D. McCalpin, Ph.D.
Texas Advanced Computing Center
University of Texas at Austin
https://www.tacc.utexas.edu/about/directory/john-mccalpin

Received on Thu Nov 05 2015 - 06:29:26 CST

This archive was generated by hypermail 2.3.0 : Sun Nov 08 2015 - 22:03:19 CST