PathScale STREAM Submission

From: Kevin Ball (kball@pathscale.com)
Date: Fri Feb 04 2005 - 18:00:53 UTC

  • Next message: Duc Vianney: "Standard 64-bit STREAM results on an IBM eServer OpenPower 710"

    Hi John,

      I'm a benchmark engineer at PathScale, working under Tom Elken. We'd
    like to do a STREAM submission with the latest released compiler
    (EKOPath 2.0) on AMD Opteron. There are both serial results with one
    machine, and OpenMP results up to 4 threads with another. Details are
    below. Thanks much!

    -Kevin Ball

    System info:
                 Compiler: PathScale EKO Compiler Suite, Release 2.0
               Model Name: ASUS SK8N Motherboard, AMD Opteron (TM) Model 248
                      CPU: AMD Opteron 248
                  CPU MHz: 2200
                      FPU: Integrated
           CPU(s) enabled: 1 core, 1 chip, 1 core/chip
                   Memory: 4x512MB, DDR400, PC3200, Corsair, CL2
         Operating System: SuSE Linux 9.0 (AMD64) 2.4.21-102-default

    > pathcc -Ofast -LNO:prefetch_ahead=10 -lm stream.c mysecond.c -o
    path2.0

    stream.c:
    mysecond.c:
    > ./path2.0
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 2000000, Offset = 0
    Total memory required = 45.8 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 9317 microseconds.
       (= 9317 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 4811.3611 0.0060 0.0067 0.0067
    Scale: 4782.3883 0.0060 0.0067 0.0068
    Add: 4684.7375 0.0092 0.0102 0.0103
    Triad: 4681.5783 0.0092 0.0103 0.0103
    -------------------------------------------------------------
    Solution Validates
    -------------------------------------------------------------

    System info:
                 Compiler: PathScale EKO Compiler Suite, Release 2.0
               Model Name: 4-way, 2.2 GHz AMD Opteron (TM) Model 848
                      CPU: AMD Opteron 848
                  CPU MHz: 2200
                      FPU: Integrated
           CPU(s) enabled: 4 core, 4 chip, 1 core/chip
                   Memory: 16x1024MB, DDR400
         Operating System: Fedora Core 2; 2.6.8-1.521smp kernel

    > pathcc -Ofast -LNO:prefetch_ahead=10 -DUNDERSCORE -c mysecond.c
    > pathf90 -Ofast -LNO:prefetch_ahead=10 -DUNDERSCORE -mp -c stream.f
    > pathf90 -Ofast -LNO:prefetch_ahead=10 -DUNDERSCORE -mp mysecond.o
    stream.o -o path2.0mp

    > export OMP_NUM_THREADS=1
    > ./path2.0mp
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 4000000
     Offset = 0
     The total memory requirement is 91 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     Number of Threads = 1
     ----------------------------------------------------
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 4455.9519 0.0144 0.0144 0.0144
    Scale: 4503.5728 0.0142 0.0142 0.0143
    Add: 4326.2548 0.0222 0.0222 0.0222
    Triad: 4401.4471 0.0218 0.0218 0.0219
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    > export OMP_NUM_THREADS=2
    > ./path2.0mp
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 4000000
     Offset = 0
     The total memory requirement is 91 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     Number of Threads = 2
     2
     ----------------------------------------------------
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 8736.1427 0.0073 0.0073 0.0073
    Scale: 8876.5403 0.0072 0.0072 0.0072
    Add: 8521.9409 0.0113 0.0113 0.0113
    Triad: 8660.5120 0.0111 0.0111 0.0111
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    > export OMP_NUM_THREADS=4
    > ./path2.0mp
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 4000000
     Offset = 0
     The total memory requirement is 91 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     Number of Threads = 4
    4
    4 Number of Threads = 4 44 Number of Threads = 4
     
     ----------------------------------------------------
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 15377.8332 0.0042 0.0042 0.0042
    Scale: 15845.3135 0.0040 0.0040 0.0040
    Add: 15617.6086 0.0062 0.0061 0.0062
    Triad: 15920.8091 0.0061 0.0060 0.0061
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------



    This archive was generated by hypermail 2.1.4 : Tue Feb 15 2005 - 07:11:56 UTC