-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 80000000 (elements), Offset = 0 (elements)
Memory per array = 610.4 MiB (= 0.596 GiB).
 (Array is sized for systems with cache sizes up to 152.59 MiB)
 (Results are acceptable for Cache sizes up to 160.00 MiB)
Total memory required = 1831.1 MiB (= 1.788 GiB).
-------------------------------------------------------------
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Number of Threads requested = 8
Number of Threads counted = 8
DEBUG: The system has 32 logical processors
DEBUG: Manually setting cpu affinity for each thread
DEBUG:    thread 1 setting affinity for core 4
DEBUG:    thread 2 setting affinity for core 8
DEBUG:    thread 7 setting affinity for core 28
DEBUG:    thread 3 setting affinity for core 12
DEBUG:    thread 4 setting affinity for core 16
DEBUG:    thread 6 setting affinity for core 24
DEBUG:    thread 5 setting affinity for core 20
DEBUG:    thread 0 setting affinity for core 0
DEBUG: The system has 32 logical processors
Affinity Mask for Thread 3 includes Processor 12
Affinity Mask for Thread 2 includes Processor 8
Affinity Mask for Thread 5 includes Processor 20
Affinity Mask for Thread 0 includes Processor 0
Affinity Mask for Thread 1 includes Processor 4
Affinity Mask for Thread 7 includes Processor 28
Affinity Mask for Thread 4 includes Processor 16
Affinity Mask for Thread 6 includes Processor 24
-------------------------------------------------------------
Your timer granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 44127 microseconds.
   (= 44127 timer ticks)
-------------------------------------------------------------
----- Rates based on fastest execution of each kernel -----
(delta) is avg performance relative to best iteration for each kernel
Function    Best Rate MB/s  Min time     Avg time  (delta)      Max time
Copy:        32310.867      0.039615     0.040047 (-1.08%)      0.040745
Scale:       31500.593      0.040634     0.040730 (-0.24%)      0.040885
Add:         29829.034      0.064367     0.064728 (-0.56%)      0.065039
Triad:       30487.400      0.062977     0.063344 (-0.58%)      0.063666
-------------------------------------------------------------
----- Alternate Selection of Results ------------
----- Rates based on fastest full iteration (sum of 4 kernel times) -----
  Fastest (overall) iteration was iteration 1 of 10
  (delta) is the performance in iteration 1 relative to the fastest individual kernel timing
Function    Best Rate MB/s   time     (delta)
Copy:        32310.867      0.039615 ( 0.00%)
Scale:       31446.532      0.040704 (-0.17%)
Add:         29829.034      0.064367 ( 0.00%)
Triad:       30250.452      0.063470 (-0.78%)
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------