STREAM: Result set - Dell PowerEdge 1950 III - 2 x Intel Xeon Quad Core E5440

From: Chris Youb <cyoub@yottayotta.com>
Date: Thu May 01 2008 - 18:47:20 CDT

Here are my results from running STREAM on the configuration listed
below. For the most part the results are tightly grouped with little
variation. I noticed that I achieved better results as I increased the
number of OMP_NUM_THREADS, until they peaked at 8. Incidently this is
the number of cores in my system (2 x Quad core).

However, on my dual CPU AMD 2212HE based system the results are not very
consistent at all. When using identical software and configuration to
that listed below, my results can range from 3500MB/s to 7700MB/s. Do
you have any advice or suggestions about what could be wrong?

--------------------------------------------------------------

Hardware:
- Dell PowerEdge 1950 III
- Two (2) Intel Xeon Quad Core E5440
- Memory stick confirguration (from factory): 2G x 0 x 2G x 0 x 2G x 0 x
2G x 0 = 8G total memory

--------------------------------------------------------------

Operating System:
- Ubuntu 8.04 AMD64 Desktop

Compiled as follows:
    gcc -O3 -fopenmp -D_OPENMP stream.c -o stream

Run as follows:
    export OMP_NUM_THREADS=8
    ./stream

--------------------------------------------------------------

Results:

    [cyoub@cyoub.edmonton.yottayotta.com]$ cat results
    -------------------------------------------------------------
    STREAM version $Revision: 5.8 $
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 2000000, Offset = 0
    Total memory required = 45.8 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 8
    -------------------------------------------------------------
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 3380 microseconds.
       (= 3380 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6341.4944 0.0051 0.0050 0.0051
    Scale: 5992.3979 0.0054 0.0053 0.0054
    Add: 5572.2832 0.0086 0.0086 0.0086
    Triad: 5771.3161 0.0083 0.0083 0.0083
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    -------------------------------------------------------------
    STREAM version $Revision: 5.8 $
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 2000000, Offset = 0
    Total memory required = 45.8 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 8
    -------------------------------------------------------------
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 3644 microseconds.
       (= 3644 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6282.1310 0.0051 0.0051 0.0051
    Scale: 5982.2485 0.0054 0.0053 0.0056
    Add: 5561.3544 0.0087 0.0086 0.0087
    Triad: 5761.5715 0.0083 0.0083 0.0084
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    -------------------------------------------------------------
    STREAM version $Revision: 5.8 $
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 2000000, Offset = 0
    Total memory required = 45.8 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 8
    -------------------------------------------------------------
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 5143 microseconds.
       (= 5143 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6268.3415 0.0054 0.0051 0.0071
    Scale: 5839.3617 0.0056 0.0055 0.0060
    Add: 5910.5922 0.0084 0.0081 0.0085
    Triad: 5944.9754 0.0082 0.0081 0.0083
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    -------------------------------------------------------------
    STREAM version $Revision: 5.8 $
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 2000000, Offset = 0
    Total memory required = 45.8 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 8
    -------------------------------------------------------------
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 2902 microseconds.
       (= 2902 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6566.7463 0.0051 0.0049 0.0053
    Scale: 6043.3936 0.0054 0.0053 0.0056
    Add: 5772.8055 0.0086 0.0083 0.0087
    Triad: 5993.2898 0.0082 0.0080 0.0084
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    -------------------------------------------------------------
    STREAM version $Revision: 5.8 $
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 2000000, Offset = 0
    Total memory required = 45.8 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 8
    -------------------------------------------------------------
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    Printing one line per active thread....
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 5073 microseconds.
       (= 5073 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6210.0462 0.0058 0.0052 0.0074
    Scale: 5911.6335 0.0061 0.0054 0.0078
    Add: 5853.0276 0.0090 0.0082 0.0104
    Triad: 5756.6292 0.0087 0.0083 0.0106
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------


Received on Sat May 03 10:06:03 2008

This archive was generated by hypermail 2.1.8 : Sat Dec 06 2008 - 12:15:45 CST