------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 80000000 (elements), Offset = 0 (elements) Memory per array = 610.4 MiB (= 0.596 GiB). (Array is sized for systems with cache sizes up to 152.59 MiB) (Results are acceptable for Cache sizes up to 160.00 MiB) Total memory required = 1831.1 MiB (= 1.788 GiB). ------------------------------------------------------------- Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 12 Number of Threads counted = 12 DEBUG: The system has 32 logical processors DEBUG: Manually setting cpu affinity for each thread DEBUG: thread 11 setting affinity for core 14 DEBUG: thread 1 setting affinity for core 4 DEBUG: thread 2 setting affinity for core 8 DEBUG: thread 7 setting affinity for core 28 DEBUG: thread 4 setting affinity for core 16 DEBUG: thread 5 setting affinity for core 20 DEBUG: thread 3 setting affinity for core 12 DEBUG: thread 8 setting affinity for core 2 DEBUG: thread 0 setting affinity for core 0 DEBUG: thread 10 setting affinity for core 10 DEBUG: thread 9 setting affinity for core 6 DEBUG: thread 6 setting affinity for core 24 DEBUG: The system has 32 logical processors Affinity Mask for Thread 3 includes Processor 12 Affinity Mask for Thread 2 includes Processor 8 Affinity Mask for Thread 1 includes Processor 4 Affinity Mask for Thread 8 includes Processor 2 Affinity Mask for Thread 9 includes Processor 6 Affinity Mask for Thread 0 includes Processor 0 Affinity Mask for Thread 6 includes Processor 24 Affinity Mask for Thread 7 includes Processor 28 Affinity Mask for Thread 5 includes Processor 20 Affinity Mask for Thread 4 includes Processor 16 Affinity Mask for Thread 10 includes Processor 10 Affinity Mask for Thread 11 includes Processor 14 ------------------------------------------------------------- Your timer granularity/precision appears to be 1 microseconds. Each test below will take on the order of 29372 microseconds. (= 29372 timer ticks) ------------------------------------------------------------- ----- Rates based on fastest execution of each kernel ----- (delta) is avg performance relative to best iteration for each kernel Function Best Rate MB/s Min time Avg time (delta) Max time Copy: 47246.455 0.027092 0.027200 (-0.40%) 0.027317 Scale: 46633.333 0.027448 0.027587 (-0.50%) 0.027673 Add: 44461.114 0.043184 0.043450 (-0.61%) 0.043602 Triad: 45831.040 0.041893 0.042511 (-1.45%) 0.042819 ------------------------------------------------------------- ----- Alternate Selection of Results ------------ ----- Rates based on fastest full iteration (sum of 4 kernel times) ----- Fastest (overall) iteration was iteration 1 of 10 (delta) is the performance in iteration 1 relative to the fastest individual kernel timing Function Best Rate MB/s time (delta) Copy: 47246.455 0.027092 ( 0.00%) Scale: 46371.520 0.027603 (-0.56%) Add: 44461.114 0.043184 ( 0.00%) Triad: 45831.040 0.041893 ( 0.00%) ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------