STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, DL380 G4, and ML370 G4 with AMD 2.8GHz Opteron CPUs

From: Schmidt, David (Performance Eng.) (D.Schmidt@hp.com)
Date: Wed Sep 28 2005 - 18:46:47 CDT

  • Next message: Ly Vu: "standard STREAM on IBM System p5 505 (2CPUs 1.65 GHz)"

    John,
    Below are standard STREAM results for the HP ProLiant BL25p (2 CPUs),
    the HP ProLiant BL45p (4 CPUs), the HP ProLiant DL385 (2 CPUs), and the
    HP ProLiant DL585 (4 CPU) using 2.8GHz AMD Opteron processors. The
    configurations are described below with the results. Please post them to
    the STREAM website. Thanks,

    HP ProLiant BL25p
    2x2.8GHz/1MB L2 Opteron 254 processors
    16GB memory (8x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1 - kernel 2.6.5-7.139-smp

    I used Revision 5.3 of the stream code and compiled with PathScale EKO
    Compiler Suite 2.1 and used the following:
    /opt/pathscale/bin/pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4
    -mp -o ompstream stream_omp.c

    For 2x2.8GHz Opteron 254 processors:

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 2500000, Offset = 26112
    Total memory required = 57.2 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 2
    Number of Threads requested = 2
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 5116 microseconds.
       (= 5116 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 9368.0362 0.0039 0.0043 0.0043
    Scale: 9206.1106 0.0040 0.0043 0.0047
    Add: 9457.6361 0.0057 0.0063 0.0064
    Triad: 9499.4051 0.0057 0.0063 0.0064
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    ------------------------------------------------------------------------
    -------
    HP ProLiant BL45p
    4x2.8GHz /1MB L2 854 Opteron processors
    32GB PC3200 memory (16x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1

    I used Revision 5.3 of the stream code and compiled with PathScale EKO
    C++ compiler v.2.1:

       pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
    stream_omp.c

    For 4x2.8GHz Opteron 854 processors:
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 8000000, Offset = 49152
    Total memory required = 183.1 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 4
    Number of Threads requested = 4
    Number of Threads requested = 4
    Number of Threads requested = 4
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 8164 microseconds.
       (= 8164 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 18515.9825 0.0062 0.0069 0.0070
    Scale: 17664.8760 0.0066 0.0072 0.0074
    Add: 18595.7227 0.0093 0.0103 0.0104
    Triad: 18689.8062 0.0093 0.0103 0.0103
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    ------------------------------------------------------------------------
    -------
    HP ProLiant DL385
    2x2.8Hz/1MB L2 254 Opteron processors
    16GB PC3200 DDR memory (8x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64)

    I used Revision 5.3 of the stream code and compiled with PathScale C/C++
    for Linux v.2.1:

       pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o
    ompstream -o ompstream stream_omp.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 1000000, Offset = 59392
    Total memory required = 22.9 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 2
    Number of Threads requested = 2
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 2033 microseconds.
       (= 2033 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 9400.3171 0.0015 0.0017 0.0017
    Scale: 9205.6055 0.0016 0.0017 0.0017
    Add: 9638.3853 0.0022 0.0025 0.0025
    Triad: 9721.2261 0.0022 0.0025 0.0025
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    ------------------------------------------------------------------------
    -------
    HP ProLiant DL585
    4x2.8GHz/1MB L2 854 Opteron processors
    32GB PC3200 memory (16x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1

    I used Revision 5.3 of the stream code and compiled with PathScale EKO
    C++ compiler v.2.1:

       pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
    stream_omp.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 7000000, Offset = 58880
    Total memory required = 160.2 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 4
    Number of Threads requested = 4
    Number of Threads requested = 4
    Number of Threads requested = 4
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 7424 microseconds.
       (= 7424 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 18285.7940 0.0055 0.0061 0.0062
    Scale: 18223.3706 0.0055 0.0061 0.0062
    Add: 18346.7355 0.0082 0.0092 0.0092
    Triad: 18386.9497 0.0082 0.0091 0.0093
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------

    David Schmidt
    Hewlett-Packard Company
    (281) 514-5039
    D.Schmidt@hp.com



    This archive was generated by hypermail 2.1.4 : Mon Oct 03 2005 - 20:46:12 CDT