STREAM results for Core 2 CPUs

From: Romain Dolbeau <romain.dolbeau@caps-entreprise.com>
Date: Tue Sep 25 2007 - 06:19:35 CDT



Hello,

Here are some results for various Core 2 family CPUs & chipsets,
as I haven't seen many of those in the tables. OS is linux 2.6.18
or 2.6.21 for architecture x86_64.

Binaries were generated (a while ago) with PGI compilers, release 6.2-3:
#####
pgcc -c -O3 mysecond.c -DUNDERSCORE
pgf90 -O3 -Mvect=assoc,sse,cachesize:524288,nosizelimit,prefetch -Minfo
-Mneginfo -tp k8-64 -Mprefetch=plain,nta,t0,w -Mscalarsse -Msmart
-Munroll -Mkeepasm stream.f mysecond.o -o streams
pgf90 -O3 -Mvect=assoc,sse,cachesize:524288,nosizelimit,prefetch -Minfo
-Mneginfo -tp k8-64 -Mprefetch=plain,nta,t0,w -Mscalarsse -Msmart
-Munroll -Mkeepasm stream.f mysecond.o -mp=numa -o streams_omp
#####
Those flags weren't particularily optimized for Intel, but they're
known to give decent results on F90 code on AMD systems... they're
probably suboptimal on Intel systems. Any suggestion on a good/better
way of compiling the stream benchmark on linux/x86_64 is welcome,
as I haven't had time to investigate.

The only change to the source code is to push the problem
size to n=20000000.


***** Core 2 Duo 6320 on Asus P5B (Intel® P965 Chipset)
### no openmp
----------------------------------------------
  Double precision appears to have 16 digits of accuracy
  Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
  ----------------------------------------------
  STREAM Version $Revision: 5.6 $
  ----------------------------------------------
  Array size = 20000000
  Offset = 0
  The total memory requirement is 457 MB
  You are running each test 10 times
  --
  The *best* time for each test is used
  *EXCLUDING* the first and last iterations
  ----------------------------------------------
  ----------------------------------------------
  Printing one line per active thread....
  ----------------------------------------------------
  Your clock granularity/precision appears to be 1 microseconds
  ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 5124.4961 0.0630 0.0624 0.0638
Scale: 5135.5353 0.0634 0.0623 0.0712
Add: 5330.6695 0.0908 0.0900 0.0939
Triad: 5317.2102 0.0912 0.0903 0.0948
  ----------------------------------------------------
  Solution Validates!
  ----------------------------------------------------
### OMP_NUM_THREADS=2
----------------------------------------------
  Double precision appears to have 16 digits of accuracy
  Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
  ----------------------------------------------
  STREAM Version $Revision: 5.6 $
  ----------------------------------------------
  Array size = 20000000
  Offset = 0
  The total memory requirement is 457 MB
  You are running each test 10 times
  --
  The *best* time for each test is used
  *EXCLUDING* the first and last iterations
  ----------------------------------------------
  Number of Threads = 2
  ----------------------------------------------
  Printing one line per active thread....
  Printing one line per active thread....
  ----------------------------------------------------
  Your clock granularity/precision appears to be 1 microseconds
  ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 4919.5899 0.0652 0.0650 0.0653
Scale: 3207.3134 0.1076 0.0998 0.1365
Add: 3839.5093 0.1253 0.1250 0.1257
Triad: 3838.3967 0.1299 0.1251 0.1475
  ----------------------------------------------------
  Solution Validates!
  ----------------------------------------------------

***** Core 2 Duo 6320 on Supermicro PDSMi+ (Intel® 3000 (Mukilteo-2)
Chipset)
### no openmp
----------------------------------------------
  Double precision appears to have 16 digits of accuracy
  Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
  ----------------------------------------------
  STREAM Version $Revision: 5.6 $
  ----------------------------------------------
  Array size = 20000000
  Offset = 0
  The total memory requirement is 457 MB
  You are running each test 10 times
  --
  The *best* time for each test is used
  *EXCLUDING* the first and last iterations
  ----------------------------------------------
  ----------------------------------------------
  Printing one line per active thread....
  ----------------------------------------------------
  Your clock granularity/precision appears to be 1 microseconds
  ----------------------------------------------------
./streams
Function Rate (MB/s) Avg time Min time Max time
Copy: 4414.2292 0.0726 0.0725 0.0727
Scale: 4414.7664 0.0725 0.0725 0.0726
Add: 4356.8697 0.1102 0.1102 0.1105
Triad: 4356.1909 0.1102 0.1102 0.1104
  ----------------------------------------------------
  Solution Validates!
  ----------------------------------------------------
### OMP_NUM_THREADS=2
----------------------------------------------
  Double precision appears to have 16 digits of accuracy
  Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
  ----------------------------------------------
  STREAM Version $Revision: 5.6 $
  ----------------------------------------------
  Array size = 20000000
  Offset = 0
  The total memory requirement is 457 MB
  You are running each test 10 times
  --
  The *best* time for each test is used
  *EXCLUDING* the first and last iterations
  ----------------------------------------------
  Number of Threads = 2
  ----------------------------------------------
  Printing one line per active thread....
  Printing one line per active thread....
  ----------------------------------------------------
  Your clock granularity/precision appears to be 1 microseconds
  ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 4378.7592 0.0732 0.0731 0.0733
Scale: 2777.5123 0.1161 0.1152 0.1168
Add: 3237.4906 0.1486 0.1483 0.1490
Triad: 3240.1584 0.1486 0.1481 0.1488
  ----------------------------------------------------
  Solution Validates!
  ----------------------------------------------------


***** Core 2 Duo 6550 on Asus P5K-SE (Intel® P35 Chipset)
### no openmp
----------------------------------------------
  Double precision appears to have 16 digits of accuracy
  Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
  ----------------------------------------------
  STREAM Version $Revision: 5.6 $
  ----------------------------------------------
  Array size = 20000000
  Offset = 0
  The total memory requirement is 457 MB
  You are running each test 10 times
  --
  The *best* time for each test is used
  *EXCLUDING* the first and last iterations
  ----------------------------------------------
  ----------------------------------------------
  Printing one line per active thread....
  ----------------------------------------------------
  Your clock granularity/precision appears to be 1 microseconds
  ----------------------------------------------------
./streams
Function Rate (MB/s) Avg time Min time Max time
Copy: 5743.9253 0.0566 0.0557 0.0620
Scale: 5742.5982 0.0572 0.0557 0.0683
Add: 5802.8388 0.0829 0.0827 0.0832
Triad: 5813.6133 0.0833 0.0826 0.0881
  ----------------------------------------------------
  Solution Validates!
  ----------------------------------------------------
### OMP_NUM_THREADS=2
----------------------------------------------
  Double precision appears to have 16 digits of accuracy
  Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
  ----------------------------------------------
  STREAM Version $Revision: 5.6 $
  ----------------------------------------------
  Array size = 20000000
  Offset = 0
  The total memory requirement is 457 MB
  You are running each test 10 times
  --
  The *best* time for each test is used
  *EXCLUDING* the first and last iterations
  ----------------------------------------------
  Number of Threads = 2
  ----------------------------------------------
  Printing one line per active thread....
  Printing one line per active thread....
  ----------------------------------------------------
  Your clock granularity/precision appears to be 1 microseconds
  ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 5703.3837 0.0592 0.0561 0.0799
Scale: 3420.9283 0.0947 0.0935 0.0961
Add: 4073.3180 0.1257 0.1178 0.1654
Triad: 4071.8680 0.1222 0.1179 0.1509
  ----------------------------------------------------
  Solution Validates!
  ----------------------------------------------------


***** Dual Xeon 5130 on Supermicro X7DBE (Intel® 5000P (Blackford) Chipset)
### no openmp
----------------------------------------------
  Double precision appears to have 16 digits of accuracy
  Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
  ----------------------------------------------
  STREAM Version $Revision: 5.6 $
  ----------------------------------------------
  Array size = 20000000
  Offset = 0
  The total memory requirement is 457 MB
  You are running each test 10 times
  --
  The *best* time for each test is used
  *EXCLUDING* the first and last iterations
  ----------------------------------------------
  ----------------------------------------------
  Printing one line per active thread....
  ----------------------------------------------------
  Your clock granularity/precision appears to be 1 microseconds
  ----------------------------------------------------
./streams
Function Rate (MB/s) Avg time Min time Max time
Copy: 3987.9879 0.0805 0.0802 0.0811
Scale: 3925.0344 0.0817 0.0815 0.0823
Add: 4093.1520 0.1175 0.1173 0.1181
Triad: 4100.0791 0.1173 0.1171 0.1179
  ----------------------------------------------------
  Solution Validates!
  ----------------------------------------------------
### OMP_NUM_THREADS=2
----------------------------------------------
  Double precision appears to have 16 digits of accuracy
  Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
  ----------------------------------------------
  STREAM Version $Revision: 5.6 $
  ----------------------------------------------
  Array size = 20000000
  Offset = 0
  The total memory requirement is 457 MB
  You are running each test 10 times
  --
  The *best* time for each test is used
  *EXCLUDING* the first and last iterations
  ----------------------------------------------
  Number of Threads = 2
  ----------------------------------------------
  Printing one line per active thread....
  Printing one line per active thread....
  ----------------------------------------------------
  Your clock granularity/precision appears to be 1 microseconds
  ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 5858.2477 0.0547 0.0546 0.0549
Scale: 4799.7643 0.0673 0.0667 0.0715
Add: 5195.9809 0.0929 0.0924 0.0965
Triad: 5202.9635 0.0930 0.0923 0.0976
  ----------------------------------------------------
  Solution Validates!
  ----------------------------------------------------
### OMP_NUM_THREADS=4
----------------------------------------------
  Double precision appears to have 16 digits of accuracy
  Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
  ----------------------------------------------
  STREAM Version $Revision: 5.6 $
  ----------------------------------------------
  Array size = 20000000
  Offset = 0
  The total memory requirement is 457 MB
  You are running each test 10 times
  --
  The *best* time for each test is used
  *EXCLUDING* the first and last iterations
  ----------------------------------------------
  Number of Threads = 4
  ----------------------------------------------
  Printing one line per active thread....
  Printing one line per active thread....
  Printing one line per active thread....
  Printing one line per active thread....
  ----------------------------------------------------
  Your clock granularity/precision appears to be 1 microseconds
  ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 6194.1117 0.0517 0.0517 0.0519
Scale: 4893.0285 0.0687 0.0654 0.0943
Add: 5271.2542 0.0923 0.0911 0.1017
Triad: 5279.5896 0.0944 0.0909 0.1216
  ----------------------------------------------------
  Solution Validates!
  ----------------------------------------------------

--
Romain Dolbeau
<romain@dolbeau.org>

Received on Tue Sep 25 07:12:46 2007

This archive was generated by hypermail 2.1.8 : Tue Apr 15 2008 - 09:44:19 CDT