------------------------------------------------------------- STREAM version $Revision: 5.10 $ ------------------------------------------------------------- This system uses 8 bytes per array element. ------------------------------------------------------------- Array size = 80000000 (elements), Offset = 0 (elements) Memory per array = 610.4 MiB (= 0.596 GiB). (Array is sized for systems with cache sizes up to 152.59 MiB) (Results are acceptable for Cache sizes up to 160.00 MiB) Total memory required = 1831.1 MiB (= 1.788 GiB). ------------------------------------------------------------- Each kernel will be executed 10 times. The *best* time for each kernel (excluding the first iteration) will be used to compute the reported bandwidth. ------------------------------------------------------------- Number of Threads requested = 8 Number of Threads counted = 8 DEBUG: The system has 32 logical processors DEBUG: Manually setting cpu affinity for each thread DEBUG: thread 1 setting affinity for core 4 DEBUG: thread 2 setting affinity for core 8 DEBUG: thread 7 setting affinity for core 28 DEBUG: thread 3 setting affinity for core 12 DEBUG: thread 4 setting affinity for core 16 DEBUG: thread 6 setting affinity for core 24 DEBUG: thread 5 setting affinity for core 20 DEBUG: thread 0 setting affinity for core 0 DEBUG: The system has 32 logical processors Affinity Mask for Thread 3 includes Processor 12 Affinity Mask for Thread 2 includes Processor 8 Affinity Mask for Thread 5 includes Processor 20 Affinity Mask for Thread 0 includes Processor 0 Affinity Mask for Thread 1 includes Processor 4 Affinity Mask for Thread 7 includes Processor 28 Affinity Mask for Thread 4 includes Processor 16 Affinity Mask for Thread 6 includes Processor 24 ------------------------------------------------------------- Your timer granularity/precision appears to be 1 microseconds. Each test below will take on the order of 44127 microseconds. (= 44127 timer ticks) ------------------------------------------------------------- ----- Rates based on fastest execution of each kernel ----- (delta) is avg performance relative to best iteration for each kernel Function Best Rate MB/s Min time Avg time (delta) Max time Copy: 32310.867 0.039615 0.040047 (-1.08%) 0.040745 Scale: 31500.593 0.040634 0.040730 (-0.24%) 0.040885 Add: 29829.034 0.064367 0.064728 (-0.56%) 0.065039 Triad: 30487.400 0.062977 0.063344 (-0.58%) 0.063666 ------------------------------------------------------------- ----- Alternate Selection of Results ------------ ----- Rates based on fastest full iteration (sum of 4 kernel times) ----- Fastest (overall) iteration was iteration 1 of 10 (delta) is the performance in iteration 1 relative to the fastest individual kernel timing Function Best Rate MB/s time (delta) Copy: 32310.867 0.039615 ( 0.00%) Scale: 31446.532 0.040704 (-0.17%) Add: 29829.034 0.064367 ( 0.00%) Triad: 30250.452 0.063470 (-0.78%) ------------------------------------------------------------- Solution Validates: avg error less than 1.000000e-13 on all three arrays -------------------------------------------------------------