------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 80000000 (elements), Offset = 0 (elements) Memory per array = 610.4 MiB (= 0.596 GiB). (Array is sized for systems with cache sizes up to 152.59 MiB) (Results are acceptable for Cache sizes up to 160.00 MiB) Total memory required = 1831.1 MiB (= 1.788 GiB). ------------------------------------------------------------- Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 28 Number of Threads counted = 28 DEBUG: The system has 32 logical processors DEBUG: Manually setting cpu affinity for each thread DEBUG: thread 24 setting affinity for core 1 DEBUG: thread 22 setting affinity for core 27 DEBUG: thread 6 setting affinity for core 24 DEBUG: thread 23 setting affinity for core 31 DEBUG: thread 26 setting affinity for core 9 DEBUG: thread 18 setting affinity for core 11 DEBUG: thread 1 setting affinity for core 4 DEBUG: thread 4 setting affinity for core 16 DEBUG: thread 7 setting affinity for core 28 DEBUG: thread 3 setting affinity for core 12 DEBUG: thread 0 setting affinity for core 0 DEBUG: thread 2 setting affinity for core 8 DEBUG: thread 8 setting affinity for core 2 DEBUG: thread 10 setting affinity for core 10 DEBUG: thread 25 setting affinity for core 5 DEBUG: thread 5 setting affinity for core 20 DEBUG: thread 14 setting affinity for core 26 DEBUG: thread 9 setting affinity for core 6 DEBUG: thread 16 setting affinity for core 3 DEBUG: thread 11 setting affinity for core 14 DEBUG: thread 17 setting affinity for core 7 DEBUG: thread 21 setting affinity for core 23 DEBUG: thread 13 setting affinity for core 22 DEBUG: thread 15 setting affinity for core 30 DEBUG: thread 19 setting affinity for core 15 DEBUG: thread 27 setting affinity for core 13 DEBUG: thread 12 setting affinity for core 18 DEBUG: thread 20 setting affinity for core 19 DEBUG: The system has 32 logical processors Affinity Mask for Thread 1 includes Processor 4 Affinity Mask for Thread 3 includes Processor 12 Affinity Mask for Thread 2 includes Processor 8 Affinity Mask for Thread 5 includes Processor 20 Affinity Mask for Thread 4 includes Processor 16 Affinity Mask for Thread 6 includes Processor 24 Affinity Mask for Thread 7 includes Processor 28 Affinity Mask for Thread 11 includes Processor 14 Affinity Mask for Thread 9 includes Processor 6 Affinity Mask for Thread 8 includes Processor 2 Affinity Mask for Thread 17 includes Processor 7 Affinity Mask for Thread 16 includes Processor 3 Affinity Mask for Thread 10 includes Processor 10 Affinity Mask for Thread 18 includes Processor 11 Affinity Mask for Thread 13 includes Processor 22 Affinity Mask for Thread 12 includes Processor 18 Affinity Mask for Thread 19 includes Processor 15 Affinity Mask for Thread 24 includes Processor 1 Affinity Mask for Thread 25 includes Processor 5 Affinity Mask for Thread 15 includes Processor 30 Affinity Mask for Thread 0 includes Processor 0 Affinity Mask for Thread 27 includes Processor 13 Affinity Mask for Thread 26 includes Processor 9 Affinity Mask for Thread 14 includes Processor 26 Affinity Mask for Thread 20 includes Processor 19 Affinity Mask for Thread 22 includes Processor 27 Affinity Mask for Thread 23 includes Processor 31 Affinity Mask for Thread 21 includes Processor 23 ------------------------------------------------------------- Your timer granularity/precision appears to be 1 microseconds. Each test below will take on the order of 12463 microseconds. (= 12463 timer ticks) ------------------------------------------------------------- ----- Rates based on fastest execution of each kernel ----- (delta) is avg performance relative to best iteration for each kernel Function Best Rate MB/s Min time Avg time (delta) Max time Copy: 100257.878 0.012767 0.012815 (-0.37%) 0.012879 Scale: 103937.992 0.012315 0.012410 (-0.77%) 0.012590 Add: 99807.447 0.019237 0.019425 (-0.97%) 0.019534 Triad: 103711.106 0.018513 0.018603 (-0.48%) 0.018692 ------------------------------------------------------------- ----- Alternate Selection of Results ------------ ----- Rates based on fastest full iteration (sum of 4 kernel times) ----- Fastest (overall) iteration was iteration 2 of 10 (delta) is the performance in iteration 2 relative to the fastest individual kernel timing Function Best Rate MB/s time (delta) Copy: 100164.352 0.012779 (-0.09%) Scale: 102736.650 0.012459 (-1.16%) Add: 99193.985 0.019356 (-0.61%) Triad: 103711.106 0.018513 ( 0.00%) ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------