STREAM results Athlon, two different compilers on same machine

From: Norbert Juffa (norbert@pbc.com)
Date: Thu Dec 21 2000 - 14:20:48 CST

  • Next message: W Bauske: "P4 1.4Ghz Intel Results"

    John:

    I figured it might be useful to have STREAM numbers for
    the same machine, but generated with executables from 2
    different compilers.

    1. The following results are from my home system. It has
       an Asus K7V motherboard with Via KX133 chipset and an
       750 MHz Athlon Classic, 256MB of Crucial PC133-CAS2
       SDRAM. It is running WinNT4/SP6 and I used Compaq
       Visual Fortran 6.5 as the compiler. Compiler was invoked
       like so: df -unroll:8 -arch:k7

    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 9012345
     Offset = 0
     The total memory requirement is 206 MB
     You are running each test 10 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity/precision appears to be 10000 microseconds
     The tests below will each take a time on the order
     of 210000 microseconds
        (= 21 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 576.7901 0.2580 0.2500 0.2600
    Scale: 576.7901 0.2540 0.2500 0.2600
    Add: 655.4433 0.3310 0.3300 0.3400
    Triad: 636.1655 0.3430 0.3400 0.3500
     Sum of a is = 1.039394452812601E+019
     Sum of b is = 2.078788905585281E+018
     Sum of c is = 2.771718540568361E+018

    2. The following results are also from my home system.
       Again, Asus K7V motherboard with Via KX133 chipset
       and 750 MHz Athlon Classic, 256MB of Crucial PC133-CAS2
       SDRAM. Running WinNT4/SP6 and I used Lahey Fujitsu Fortran
       95 5.6 as the compiler. Compiler was invoked like so:
       lf95 -o1 -tpp

    ---------------------------------------------
    Double precision appears to have 16 digits of accuracy
    Assuming 8 bytes per DOUBLE PRECISION word
    ---------------------------------------------
    Array size = 9012345
    Offset = 0
    The total memory requirement is 206 MB
    You are running each test 10 times
    The *best* time for each test is used
    ----------------------------------------------------
    Your clock granularity/precision appears to be 10014 microseconds
    The tests below will each take a time on the order
    of 280403 microseconds
       (= 28 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    ----------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    ----------------------------------------------------
    unction Rate (MB/s) RMS time Min time Max time
    opy: 496.5194 0.2914 0.2904 0.3004
    cale: 553.8091 0.2614 0.2604 0.2704
    dd: 654.5015 0.3365 0.3305 0.3405
    riad: 635.2507 0.3475 0.3405 0.3505
    Sum of a is = 1.039394452812601E+19
    Sum of b is = 2.078788905585281E+18
    Sum of c is = 2.771718540568361E+18

    As one can see the results match nicely with the exception of
    "Copy". LF95 doesn't offer an assembly language listing so I
    have not figured out why its copy loop is slower than CVF's.

    -- Norbert



    This archive was generated by hypermail 2b29 : Sun Jan 28 2001 - 18:38:10 CST