Stream results for dual PIII and dual Athlon

From: James Cook (cook@ned1.sims.nrc.ca)
Date: Fri Aug 09 2002 - 10:21:27 CDT

  • Next message: James Cook: "Re: Stream results for dual PIII and dual Athlon"

    I have some interesting STREAM results for Dual processor machines. You
    don't seem to have many results for 2 cpu machines so I thought I would
    send these in.

    First machine is

    Dual PIII 933mhz.
    1GB of PC133 ram.
    Motherboard: ABIT VP6
    Runnning Linux kernel 2.4.19-rc3 (SMP of course)
    Compiler: PGI 3.2-4

    For single processor the compiler directives are
    pgcc -fast -tp p6 -Minline -Mvect=assoc,sse,cachesize:327680
    -Mcache_align second_wall_pg.c -c

    pgf77 -fast -tp p6 -Minline -Mvect=assoc,prefetch,cachesize:327680
    -Mcache_align second_wall_pg.o stream_d.f -o stream_pgi

    Results for 1 CPU:
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 1000000
     Offset = 0
     The total memory requirement is 22 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 583.7067 0.0274 0.0274 0.0275
    Scale: 669.3169 0.0239 0.0239 0.0240
    Add: 652.0326 0.0368 0.0368 0.0369
    Triad: 716.8468 0.0335 0.0335 0.0336
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    Next: 2 cpus. I added the compiler directive -mp to the above
    statements to enable the OpenMP directives.

    The results for OpenMP:
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 1000000
     Offset = 0
     The total memory requirement is 22 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 643.3466 0.0249 0.0249 0.0250
    Scale: 685.2555 0.0234 0.0233 0.0234
    Add: 641.6766 0.0374 0.0374 0.0375
    Triad: 611.3099 0.0393 0.0393 0.0394
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    The OpenMP doesn't seem to be any help.

    Next machine is a Dual Athlon 1900+ (1.6Ghz)
    1GB of ram (266Mhz)
    Motherboard: Asus A7M266-D
    Running SuSE SMP Linux kernel 2.4.18
    Compiler PGI 3.2-4

    the compiler directives I used:
    pgcc -fast -tp athlon -Minline -Mvect=assoc,sse,cachesize:393216
    -Mcache_align second_wall_pg.c -c

    pgf77 -fast -tp athlon -Minline -Mvect=assoc,sse,cachesize:393216
    -Mcache_align second_wall_pg.o stream_d.f -o stream_pgi

    Without OpenMP

    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 1000000
     Offset = 0
     The total memory requirement is 22 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 719.6844 0.0223 0.0222 0.0225
    Scale: 766.1756 0.0209 0.0209 0.0209
    Add: 844.4471 0.0285 0.0284 0.0286
    Triad: 849.6478 0.0283 0.0282 0.0283
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    With OpenMP

    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 1000000
     Offset = 0
     The total memory requirement is 22 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 712.4733 0.0225 0.0225 0.0225
    Scale: 741.1513 0.0216 0.0216 0.0216
    Add: 867.1146 0.0277 0.0277 0.0277
    Triad: 863.7760 0.0278 0.0278 0.0278
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    I can't seem to achieve good results with the OpenMP code. It doesn't seem
    to be doing anything at all.

    I may have be able to run the MPI version of stream soon on the cluster
    here.

    Hope you enjoy.

    James Cook
    Steacie Institue for Molecular Sciences
    National Research Council Canada



    This archive was generated by hypermail 2.1.4 : Fri Nov 08 2002 - 13:37:15 CST