More STREAM results for Core 2 CPUs

From: Romain Dolbeau <romain@dolbeau.org>
Date: Thu Jul 17 2008 - 03:09:14 CDT

[Originally sent to stream@postofc.corp.sgi.com, but I got a "550
Blocked" from the mail server, is that still the correct address for the
mailing list?]

Hello,

Here are some results for various Core 2 family CPUs & chipsets,
as I haven't seen many of those in the tables. OS is linux 2.6.22
or 2.6.24 for architecture x86_64.

Binaries were generated (a while ago) with PGI compilers, release 6.2-3:
#####
pgcc -c -O3 mysecond.c -DUNDERSCORE
pgf90 -O3 -Mvect=assoc,sse,cachesize:524288,nosizelimit,prefetch -Minfo=

-Mneginfo -tp k8-64 -Mprefetch=plain,nta,t0,w -Mscalarsse -Msmart
-Munroll -Mkeepasm stream.f mysecond.o -o streams
pgf90 -O3 -Mvect=assoc,sse,cachesize:524288,nosizelimit,prefetch -Minfo=

-Mneginfo -tp k8-64 -Mprefetch=plain,nta,t0,w -Mscalarsse -Msmart
-Munroll -Mkeepasm stream.f mysecond.o -mp=numa -o streams_omp
#####
Those flags weren't particularily optimized for Intel, but they're
known to give decent results on F90 code on AMD systems... they're
probably suboptimal on Intel systems. Any suggestion on a good/better
way of compiling the stream benchmark on linux/x86_64 is welcome,
as I haven't had time to investigate.

The only change to the source code is to push the problem
size to n=20000000.

Some motherboard/CPU combinations appears multiple times, as bandwith
seems very dependent of the exact memory used.

***** Core 2 Duo 6550 on Supermicro C2SBE (Intel® P35 Chipset), Kingsto=
n
1GB DDR2 DIMMs
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 20000000
 Offset = 0
 The total memory requirement is 457 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 ----------------------------------------------
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 5630.8122 0.0574 0.0568 0.0616
Scale: 5642.9331 0.0569 0.0567 0.0578
Add: 5529.5072 0.0873 0.0868 0.0911
Triad: 5525.6979 0.0874 0.0869 0.0910
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------


***** Core 2 Duo E8200 on Supermicro C2SBE (Intel® P35 Chipset),
Kingston 2GB DDR2 DIMMs
### no openmp
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 20000000
 Offset = 0
 The total memory requirement is 457 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 ----------------------------------------------
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 5420.3324 0.0596 0.0590 0.0615
Scale: 5386.9385 0.0601 0.0594 0.0619
Add: 5597.6765 0.0862 0.0857 0.0880
Triad: 5593.9437 0.0863 0.0858 0.0879
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

***** Core 2 Duo E8200 on Supermicro C2SBE (Intel® P35 Chipset), Samsun=
g
1GB DDR2 DIMMs
### no openmp
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 20000000
 Offset = 0
 The total memory requirement is 457 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 ----------------------------------------------
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 4980.3051 0.0645 0.0643 0.0656
Scale: 4898.0640 0.0658 0.0653 0.0672
Add: 5194.0773 0.0926 0.0924 0.0935
Triad: 5165.5050 0.0933 0.0929 0.0954
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

***** Core 2 Duo 6850 on Asus P5E (Intel® X38 Chipset), unknown 1GB DDR=
2
DIMMs
### no openmp
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 20000000
 Offset = 0
 The total memory requirement is 457 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 ----------------------------------------------
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 6398.6026 0.0504 0.0500 0.0530
Scale: 6457.7118 0.0502 0.0496 0.0552
Add: 6806.9552 0.0726 0.0705 0.0792
Triad: 6827.1065 0.0710 0.0703 0.0755
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

***** Core 2 Duo E8200 on Supermicro C2SBX (Intel® X38 Chipset), unknow=
n
2GB DDR3 DIMMs
### no openmp
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 20000000
 Offset = 0
 The total memory requirement is 457 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 ----------------------------------------------
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 7249.1346 0.0449 0.0441 0.0470
Scale: 7262.8248 0.0444 0.0441 0.0463
Add: 7861.4027 0.0614 0.0611 0.0622
Triad: 7863.1835 0.0617 0.0610 0.0644
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

--
Romain Dolbeau
<romain@dolbeau.org>


Received on Thu Jul 17 07:53:25 2008

This archive was generated by hypermail 2.1.8 : Sat Dec 06 2008 - 12:16:22 CST