RE: stream source

From: ian wells (
Date: Fri May 18 2001 - 17:39:40 CDT

  • Next message: Kirby Collins: "Superdome 16-way Stream results"

    Hi John
      Sounds like a good change. Your story matches others I heard about living
    in SV. Which motivates me to stay in Colorado, with my 30minute bicycle path
    commute. This is certainly an interesting time to be in the processor
    performance business!

    Here is my stream submission for the hp workstation j6700. I would like to
    have this submitted as soon as possible. Here are the compile lines and
    compilers versions used.
    This is my first time submitting stream results, please let me know what
    other information your readers find helpful. I can also be reached on the
    weekend at or xxx-xxx-xxxx, if need be. Ian

    #! /bin/ksh
    rm a.out
    echo compile stream
    cc +O2 t2.c -c
    f90 +Odataprefetch -v +noppu +O3 +FPD -Wl,-a,archive_shared -Wl,+pd,64M
    f t2.o

    #! /bin/ksh
    echo compile stream
    rm c.out
    cc +O2 +Odataprefetch +FPD -Wl,-a,archive_shared -Wl,+pd,64M stream_d.c
    -o c.out -lm

    what `which f90`
            HP-UX f90 20001102 (085606) B3907DB/B3909DB PHSS_22539 B.11.01.26
            HP F90 v2.4.10
             $ PATCH/11.00:PHCO_95167 Oct 1 1998 13:46:32 $
    what `which cc`
             LINT A.11.01.21505.GP CXREF A.11.01.21505.GP
            HP92453-01 A.11.01.21505.GP HP C Compiler
             $ CUPI80_IC7 Jan 7 1999 11:20:34 $

    hp workstation j6700

    tcd208:/tmp/ian >./a.out
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
     Array size = 5000000
     Offset = 8
     The total memory requirement is 114 MB
     You are running each test 30 times
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     Your clock granularity/precision appears to be 2 microsecond_s
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 876.3282 0.0914 0.0913 0.0918
    Scale: 893.8445 0.0897 0.0895 0.0918
    Add: 979.6238 0.1226 0.1225 0.1240
    Triad: 994.8185 0.1209 0.1206 0.1228
     Solution Validates!

    tcd208:/tmp/ian >./c.out
    This system uses 8 bytes per DOUBLE PRECISION word.
    Array size = 5000000, Offset = 8
    Total memory required = 114.4 MB.
    Each test is run 30 times, but only
    the *best* time for each is used.
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 59159 microseconds.
       (= 19719 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 893.5851 0.0896 0.0895 0.0903
    Scale: 890.4620 0.0899 0.0898 0.0902
    Add: 994.1759 0.1208 0.1207 0.1213
    Triad: 981.0253 0.1225 0.1223 0.1246

    Ian Wells

    Application Performance Engineer
    Hewlett-Packard Company
    contact details for Ian Wells <>

    This archive was generated by hypermail 2b29 : Mon May 21 2001 - 08:38:48 CDT