Stream Results

From: <mike@musl.org>
Date: Thu Sep 07 2006 - 00:07:49 CST

Mr McCalpin,

You've written some pretty spiffy C code. Due to this fact I feel
obligated to impart the results my machine coughed up. N is set to 10^7
and NTIMES to 100 in order to use (in my humble opinion) enough memory.
Hopefully these values do not affect the accuracy of the test in a
negative way.

Also note that increasing N too far (to increase memory usage) yeilds this
error at compile time - I assume this is due to an overflow:

/tmp/ccxlIme2.s: Assembler messages:
/tmp/ccxlIme2.s:1084: Error: attempt to move .org backwards

Anyhoo, here's the Hardware, OS & compiler info, compiler args, and
output, as you've requested in stream.c:

Hardware used in the test:

Processor: Pentium D 950 (2 Cores, 1 Die)
RAM: 2Gb of DDR2 667
Motherboard: Intel D975XBX
Clock Frequencies:
    FSB: 800 MHz
    Memory: 800 MHz
    CPU: 3400 MHz
Memory Timings: 4-5-5-12 (T1)

[mike@janus stream]$ uname -r -p -o
2.6.17-1.2174_FC5smp i686 GNU/Linux

[mike@janus stream]$ gcc --version
gcc (GCC) 4.1.1 20060525 (Red Hat 4.1.1-1)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[mike@janus stream]$ gcc -O6 -o stream stream.c

[mike@janus stream]$ ./stream
-------------------------------------------------------------
STREAM version $Revision: 5.6 $
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 10000000, Offset = 0
Total memory required = 228.9 MB.
Each test is run 100 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Printing one line per active thread....
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 34473 microseconds.
   (= 34473 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 3387.1753 0.0474 0.0472 0.0484
Scale: 3373.6058 0.0476 0.0474 0.0483
Add: 3857.8386 0.0624 0.0622 0.0631
Triad: 3870.4059 0.0622 0.0620 0.0630
-------------------------------------------------------------
Solution Validates
-------------------------------------------------------------
[mike@janus stream]$

The results fall right in line with other memory benchmarks I've seen with
this hardware.

Thanks for placing the software online,

Mike
Received on Thu Sep 07 15:11:59 2006

This archive was generated by hypermail 2.1.8 : Sun Sep 17 2006 - 09:41:45 CST