STREAM results for the HP ProLiant BL20p G3, DL140 G2, and ML350 G4p with Intel 3.6Ghz Xeon CPUs

From: Schmidt, David (Performance Eng.) (D.Schmidt@hp.com)
Date: Wed Sep 28 2005 - 18:23:33 CDT

  • Next message: Schmidt, David (Performance Eng.): "STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, DL380 G4, and ML370 G4 with Intel 3.8Ghz Xeon CPUs"

    John,
    Below are standard STREAM results for the HP ProLiant BL20p G3 (1 CPU),
    the HP ProLiant DL140 G2 (1 CPU), and the HP ProLiant ML350 G4p (4 CPU)
    using 3.6GHz Intel Xeon processors. The configurations are described
    below with the results. Please post them to the STREAM website. Thanks,

    -----------------------------------------------------------------------
    [Editor's Note: In a separate e-mail, the submitter clarified that
    all results presented here were run with a single thread specified
    using the "OMP_NUM_THREADS" environment variable.]

    -----------------------------------------------------------------------
    HP ProLiant BL20p G3
    1x3.6GHz/2MB L2 Xeon processors
    8GB PC3200 memory (4x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1

    Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, Version
    8.1, Build 20040812
    icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 3500000, Offset = 5120
    Total memory required = 80.1 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 15221 microseconds.
       (= 15221 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3764.4809 0.0150 0.0149 0.0155
    Scale: 3723.8961 0.0151 0.0150 0.0151
    Add: 4028.6037 0.0209 0.0209 0.0209
    Triad: 3893.5719 0.0216 0.0216 0.0217
    -------------------------------------------------------------
    Solution Validates

    HP ProLiant DL140 G2
    1x3.6GHz/2MB L2 Xeon processor
    8GB memory (8x1024MB)
    SuSE Enterprise Linux 9 (x86_64) SP1

    Intel(R) C++ Compiler for Intel(R)EM64T-based applications, Version 8.1
    Build 20040812
    icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 9000000, Offset = 512
    Total memory required = 206.0 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 41546 microseconds.
       (= 41546 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3800.3837 0.0380 0.0379 0.0386
    Scale: 3657.1367 0.0395 0.0394 0.0398
    Add: 3994.7514 0.0541 0.0541 0.0542
    Triad: 3914.8118 0.0552 0.0552 0.0553
    -------------------------------------------------------------
    Solution Validates

    HP ProLiant ML350 G4p
    1x3.6GHz/2MB L2 Xeon processors
    8GB PC3200 memory (4x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1

    Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, Version
    8.1, Build 20040812
    icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 4500000, Offset = 16896
    Total memory required = 103.0 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 19776 microseconds.
       (= 19776 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3834.5001 0.0189 0.0188 0.0195
    Scale: 3756.0464 0.0192 0.0192 0.0192
    Add: 4100.0781 0.0264 0.0263 0.0264
    Triad: 3968.6773 0.0273 0.0272 0.0273
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------

    David Schmidt
    Hewlett-Packard Company
    (281) 514-5039
    D.Schmidt@hp.com



    This archive was generated by hypermail 2.1.4 : Mon Oct 03 2005 - 20:46:12 CDT