STREAM results for the HP ProLiant BL20p G3, DL140 G2, DL360 G4p, DL380 G4, and ML370 G4 with Intel 3.8Ghz Xeon CPUs

From: Schmidt, David (Performance Eng.) (D.Schmidt@hp.com)
Date: Wed Sep 28 2005 - 18:32:21 CDT

  • Next message: Schmidt, David (Performance Eng.): "STREAM results for the HP ProLiant BL45p and DL585 with AMD 880 Opteron CPUs"

    John,
    Below are standard STREAM results for the HP ProLiant BL20p G3 (1 CPU),
    the HP ProLiant DL140 G2 (1 CPU), the HP ProLiant DL360 G4p (1 CPU), the
    HP ProLiant DL380 G4 (1 CPU), and the HP ProLiant ML370 G4 (4 CPU) using
    3.8GHz Intel Xeon processors. The configurations are described below
    with the results. Please post them to the STREAM website. Thanks,
    -------------------------------------------------------------
    Editor's Note: In a separate note, the submitter clarified that
    all results presented below were run with a single thread (set by
    the "OMP_NUM_THREADS" environment variable.

    -------------------------------------------------------------

    HP ProLiant BL20p G3
    2x3.8GHz/2MB L2 Xeon processors
    8GB PC3200 memory (4x2GB DIMMs)
    SuSE Linux Enterprise Server 9 (x86_64) SP1

    Intel(R) C++ Compiler for Intel(R) EM64T-based Applications, Version
    8.1, Build 20040812
    icc -O3 -xP -parallel -o ompstream stream_d.c second_wall.c
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 3500000, Offset = 14336
    Total memory required = 80.1 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 15160 microseconds.
       (= 15160 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3767.2584 0.0150 0.0149 0.0156
    Scale: 3730.8759 0.0151 0.0150 0.0152
    Add: 4020.1456 0.0209 0.0209 0.0209
    Triad: 3899.9074 0.0216 0.0215 0.0216
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    ------------------------------------------------------------------------
    ---------------
    HP ProLiant DL140 G2
    2x3.8GHz/2MB L2 Xeon processor
    8GB memory (8x1024MB)
    SuSE Enterprise Linux 9 (x86_64) SP1

    Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1
    Build 20040812
    icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 9000000, Offset = 20480
    Total memory required = 206.0 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 44713 microseconds.
       (= 44713 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3917.2916 0.0369 0.0368 0.0375
    Scale: 3760.3727 0.0383 0.0383 0.0384
    Add: 4086.4114 0.0529 0.0529 0.0533
    Triad: 4004.8345 0.0540 0.0539 0.0540
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    ------------------------------------------------------------------------
    ---------------
    HP ProLiant DL360 G4p
    2x3.8GHz/2MB L2 Xeon processor
    8GB memory (4x2048MB Dual Rank DIMMs)
    SuSE Enterprise Linux 9 (x86_64) SP1

    Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1
    Build 20040812
    icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 9000000, Offset = 16896
    Total memory required = 206.0 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 40524 microseconds.
       (= 40524 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3912.0902 0.0369 0.0368 0.0375
    Scale: 3768.9375 0.0382 0.0382 0.0383
    Add: 4149.2957 0.0521 0.0521 0.0521
    Triad: 4045.5549 0.0534 0.0534 0.0535
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    ------------------------------------------------------------------------
    ---------------
    HP ProLiant DL380 G4
    2x3.8GHz/2MB L2 Xeon processor
    8GB memory (4x2048MB Dual Rank DIMMs)
    SuSE Enterprise Linux 9 (x86_64) SP1

    Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1
    Build 20040812
    icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 9000000, Offset = 24576
    Total memory required = 206.0 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 40484 microseconds.
       (= 40484 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3919.6559 0.0369 0.0367 0.0375
    Scale: 3775.2747 0.0382 0.0381 0.0382
    Add: 4151.7676 0.0521 0.0520 0.0521
    Triad: 4049.7142 0.0534 0.0533 0.0535
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------
    ------------------------------------------------------------------------
    ---------------
    HP ProLiant ML370 G4
    1x3.8GHz/2MB L2 Xeon processor
    8GB memory (8x1024MB)
    SuSE Enterprise Linux 9 (x86_64) SP1

    Intel(R) C++ Compiler for Intel(R) EM64T-based applications, Version 8.1
    Build 20040812
    icc -O3 -xP -parallel -ip -o ompstream stream_d.c second_wall.c

    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 9000000, Offset = 2048
    Total memory required = 206.0 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 40999 microseconds.
       (= 40999 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3893.0521 0.0371 0.0370 0.0377
    Scale: 3750.0064 0.0384 0.0384 0.0384
    Add: 4119.2423 0.0525 0.0524 0.0525
    Triad: 4013.6703 0.0539 0.0538 0.0539
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------

    David Schmidt
    Hewlett-Packard Company
    (281) 514-5039
    D.Schmidt@hp.com



    This archive was generated by hypermail 2.1.4 : Mon Oct 03 2005 - 20:46:12 CDT