------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 80000000 (elements), Offset = 0 (elements) Memory per array = 610.4 MiB (= 0.596 GiB). (Array is sized for systems with cache sizes up to 152.59 MiB) (Results are acceptable for Cache sizes up to 160.00 MiB) Total memory required = 1831.1 MiB (= 1.788 GiB). ------------------------------------------------------------- Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 4 Number of Threads counted = 4 DEBUG: The system has 32 logical processors DEBUG: Manually setting cpu affinity for each thread DEBUG: thread 0 setting affinity for core 0 DEBUG: thread 1 setting affinity for core 4 DEBUG: thread 2 setting affinity for core 8 DEBUG: thread 3 setting affinity for core 12 DEBUG: The system has 32 logical processors Affinity Mask for Thread 0 includes Processor 0 Affinity Mask for Thread 1 includes Processor 4 Affinity Mask for Thread 3 includes Processor 12 Affinity Mask for Thread 2 includes Processor 8 ------------------------------------------------------------- Your timer granularity/precision appears to be 1 microseconds. Each test below will take on the order of 88303 microseconds. (= 88303 timer ticks) ------------------------------------------------------------- ----- Rates based on fastest execution of each kernel ----- (delta) is avg performance relative to best iteration for each kernel Function Best Rate MB/s Min time Avg time (delta) Max time Copy: 16168.522 0.079166 0.079552 (-0.49%) 0.079779 Scale: 15611.841 0.081989 0.082148 (-0.19%) 0.082263 Add: 14792.440 0.129796 0.130116 (-0.25%) 0.130564 Triad: 15054.594 0.127536 0.127862 (-0.25%) 0.128243 ------------------------------------------------------------- ----- Alternate Selection of Results ------------ ----- Rates based on fastest full iteration (sum of 4 kernel times) ----- Fastest (overall) iteration was iteration 6 of 10 (delta) is the performance in iteration 6 relative to the fastest individual kernel timing Function Best Rate MB/s time (delta) Copy: 16168.522 0.079166 ( 0.00%) Scale: 15600.998 0.082046 (-0.07%) Add: 14746.986 0.130196 (-0.31%) Triad: 15034.779 0.127704 (-0.13%) ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------