STREAM Results for the HP ProLiant DL585 and HP ProLiant BL45p with 875 Opteron CPUs

From: Schmidt, David (Performance Eng.) (d.schmidt@hp.com)
Date: Mon Jun 06 2005 - 14:30:02 CDT

  • Next message: Dr. Bob: "stream results"

    John,
    Below are standard STREAM results for the HP ProLiant DL585 (4 CPU), and
    the HP ProLiant BL45p (4 CPU) using 2.2Ghz 875 Opteron processors. The
    configurations are described below with the results:

    HP ProLiant DL585
    4x2.2GHz/1MB L2 875 Opteron processors
    32GB PC3200 memory (16x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1

    I used Revision 5.3 of the stream code and compiled with PathScale EKO
    C++ compiler v.2.1:

       pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
    stream_omp.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 8000000, Offset = 41472
    Total memory required = 183.1 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 8168 microseconds.
       (= 8168 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 15950.2930 0.0072 0.0080 0.0080
    Scale: 16031.7401 0.0072 0.0080 0.0081
    Add: 16206.6083 0.0107 0.0118 0.0120
    Triad: 16256.0078 0.0107 0.0118 0.0120
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------

    =============================================================

    HP ProLiant BL45p
    4x2.2GHz/1MB L2 875 Opteron processors
    32GB PC3200 memory (16x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1

    I used Revision 5.3 of the stream code and compiled with PathScale EKO
    C++ compiler v.2.1:

       pathcc -O3 -CG:use_prefetchnta -LNO:prefetch_ahead=4 -mp -o ompstream
    stream_omp.c

    Running Test: N = 8000000 Offset = 49152
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 8000000, Offset = 49152
    Total memory required = 183.1 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    Number of Threads requested = 8
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 7992 microseconds.
       (= 7992 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 16171.7848 0.0071 0.0079 0.0081
    Scale: 16322.2337 0.0071 0.0078 0.0080
    Add: 16022.4900 0.0108 0.0120 0.0122
    Triad: 16089.7158 0.0108 0.0119 0.0120
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    David Schmidt
    Hewlett-Packard Company
    (281) 514-5039
    D.Schmidt@hp.com



    This archive was generated by hypermail 2.1.4 : Mon Jun 13 2005 - 08:58:17 CDT