[Fwd: Pentium4 STREAM benchmarks posted (finally)]

From: John McCalpin (mccalpin@austin.ibm.com)
Date: Mon May 14 2001 - 15:14:58 CDT

  • Next message: Norbert Juffa: "STREAM results: Athlon DDR platform result"

    -- 
    John D. McCalpin, Ph.D.           mccalpin@austin.ibm.com
    Senior Technical Staff Member     IBM POWER Microprocessor Development
        "I am willing to make mistakes as long as
         someone else is willing to learn from them."
    
    

    attached mail follows:


    On Wed, 31 Jan 2001 17:38:51 -0600, John McCalpin <mccalpin@austin.ibm.com> said:

    > Sure -- I am always interested in results!

    Okay, here goes.

    I tested with both the fortran code and the C code as found in the source section on the web page.

    There seems to be a problem with g77 and the printout of the clock granularity/precision, but the results aren't too different from the C code. I'm including results from three consecutive runs for both Fortran and C, all run in single user mode to make sure there are as few disturbing processes as possible.

    Please let me know if there's any additional information needed.

    ====================================================================== System information:

    Linux shoggelilla.rlyeh.net 2.2.18 #13 Mon Jan 22 01:30:02 UTC 2001 i586 unknown

    Toshiba Satellite 2545 XCDT AMD K6-2 3D (64 KB L2 cache) 64 MB PC100 SDRAM (SO-DIMM)

    ====================================================================== Fortran:

    g77 version 2.95.2 20000220 (Debian GNU/Linux) (from FSF-g77 version 0.5.25 1999 1030 (prerelease))

    g77 -O2 -fomit-frame-pointer -fno-strength-reduce -lm -o stream_d second_wall.o stream_d.f

    ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity/precision appears to be 10000 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 44.4445 0.7250 0.7200 0.7300 Scale: 43.8356 0.7375 0.7300 0.7400 Add: 47.0588 1.0275 1.0200 1.0300 Triad: 50.5264 0.9587 0.9500 0.9600 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity/precision appears to be 10000 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 44.4445 0.7250 0.7200 0.7300 Scale: 43.8356 0.7388 0.7300 0.7400 Add: 47.0588 1.0275 1.0200 1.0300 Triad: 50.5263 0.9575 0.9500 0.9600 ---------------------------------------------------- Solution Validates! ---------------------------------------------------- ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 2000000 Offset = 0 The total memory requirement is 45 MB You are running each test 10 times -- The *best* time for each test is used *EXCLUDING* the first and last iterations ---------------------------------------------------- Your clock granularity/precision appears to be 10000 microseconds ---------------------------------------------------- Function Rate (MB/s) Avg time Min time Max time Copy: 44.4445 0.7250 0.7200 0.7300 Scale: 43.8356 0.7363 0.7300 0.7400 Add: 46.6020 1.0300 1.0300 1.0300 Triad: 50.5263 0.9575 0.9500 0.9600 ---------------------------------------------------- Solution Validates! ----------------------------------------------------

    ====================================================================== C:

    gcc version 2.95.2 20000220 (Debian GNU/Linux)

    gcc -O2 -fomit-frame-pointer -fno-strength-reduce -lm -o stream_d second_wall.o stream_d.c

    ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 2 microseconds. Each test below will take on the order of 523253 microseconds. (= 261626 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 44.2880 0.7229 0.7225 0.7246 Scale: 44.2507 0.7237 0.7232 0.7250 Add: 49.2808 0.9743 0.9740 0.9747 Triad: 50.2443 0.9556 0.9553 0.9564 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 2 microseconds. Each test below will take on the order of 529303 microseconds. (= 264651 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 44.1983 0.7243 0.7240 0.7248 Scale: 44.2068 0.7245 0.7239 0.7261 Add: 49.6747 0.9667 0.9663 0.9675 Triad: 50.7506 0.9460 0.9458 0.9461 ------------------------------------------------------------- This system uses 8 bytes per DOUBLE PRECISION word. ------------------------------------------------------------- Array size = 2000000, Offset = 0 Total memory required = 45.8 MB. Each test is run 10 times, but only the *best* time for each is used. ------------------------------------------------------------- Your clock granularity/precision appears to be 2 microseconds. Each test below will take on the order of 529150 microseconds. (= 264575 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ------------------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ------------------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 44.1879 0.7245 0.7242 0.7253 Scale: 44.1997 0.7241 0.7240 0.7242 Add: 49.6741 0.9667 0.9663 0.9669 Triad: 50.7761 0.9457 0.9453 0.9470



    This archive was generated by hypermail 2b29 : Mon May 21 2001 - 08:38:46 CDT