STREAM results for Sunblade 1000/750....

From: John Stone (johns@ks.uiuc.edu)
Date: Sun Nov 19 2000 - 23:44:51 CST

  • Next message: webmaster: "Results obtained on DDR mainboard using Duron 600/800/800(133)"

    Hi,
      I ran STREAM on a 1-cpu 750MHz Sunblade 1000 tonight, and thought
    you might be interested in their numbers. I compiled it using
    the Sun Forte 6 maintenance update 1 compilers, with the UltraSPARC-III
    architecture compiler flags, to allow for whatever tricks their compiler
    would do. I gave the results for both the C and Fortran version of STREAM,
    and the results are different, C being a lot slower, despite very agressive
    optimization flags. This didn't used to be quite so different with older
    compilers/machines from Sun, I seem to remember that the C/Fortran results
    were closer with older machines/compilers. Anyway, here are the numbers.

      John Stone
      johns@ks.uiuc.edu

    **
    ** 32-bit Fortran results (-xarch=v8plusb)
    **
    johns@sundemo[21] f77 second_cpu.f stream_d.f -fast -xarch=v8plusb -fsimple=2
    second_cpu.f:
            second:
    stream_d.f:
     MAIN stream:
            realsize:
            confuse:
            checktick:
            checksums:
    Linking:
    johns@sundemo[22] a.out
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 20000000
     Offset = 0
     The total memory requirement is 457 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity/precision appears to be 7 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 827.6625 0.3869 0.3866 0.3872
    Scale: 768.9331 0.4163 0.4162 0.4164
    Add: 827.3550 0.5832 0.5802 0.5840
    Triad: 845.0919 0.5682 0.5680 0.5685
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    **
    ** 64-bit Fortran results (-xarch=v9b)
    **
    johns@sundemo[17] f77 second_cpu.f stream_d.f -fast -xarch=v9b -fsimple=2
    second_cpu.f:
            second:
    stream_d.f:
     MAIN stream:
            realsize:
            confuse:
            checktick:
            checksums:
    Linking:
    johns@sundemo[18] a.out
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 20000000
     Offset = 0
     The total memory requirement is 457 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity/precision appears to be 7 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 809.2464 0.3957 0.3954 0.3959
    Scale: 814.7661 0.3933 0.3928 0.3936
    Add: 931.4449 0.5158 0.5153 0.5160
    Triad: 890.7752 0.5392 0.5389 0.5395
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    **
    ** 32-bit C results (-xarch=v8plusb)
    **
    johns@sundemo[33] cc -fast -xO5 -native -xarch=v8plusb -xrestrict -fsimple=2 stream_d.c second_cpu.c
    stream_d.c:
    second_cpu.c:
    johns@sundemo[34] a.out
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 8000000, Offset = 0
    Total memory required = 183.1 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 9999 microseconds.
    Each test below will take on the order of 239999 microseconds.
       (= 24 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 492.3091 0.2680 0.2600 0.2700
    Scale: 474.0749 0.2700 0.2700 0.2700
    Add: 426.6668 0.4550 0.4500 0.4600
    Triad: 426.6678 0.4520 0.4500 0.4600

    **
    ** 64-bit C results (-xarch=v9b)
    **
    johns@sundemo[35] cc -fast -xO5 -native -xarch=v9b -xrestrict -fsimple=2 stream_d.c second_cpu.c
    stream_d.c:
    second_cpu.c:
    johns@sundemo[36] a.out
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 8000000, Offset = 0
    Total memory required = 183.1 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 9999 microseconds.
    Each test below will take on the order of 250000 microseconds.
       (= 25 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 492.3091 0.2670 0.2600 0.2700
    Scale: 474.0749 0.2720 0.2700 0.2800
    Add: 426.6668 0.4570 0.4500 0.4600
    Triad: 426.6668 0.4530 0.4500 0.4600

    -- 
    NIH Resource for Macromolecular Modeling and Bioinformatics
    Beckman Institute for Advanced Science and Technology
    University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801
    Email: johns@ks.uiuc.edu                 Phone: 217-244-3349              
      WWW: http://www.ks.uiuc.edu/~johns/      Fax: 217-244-6078
    

    -- NIH Resource for Macromolecular Modeling and Bioinformatics Beckman Institute for Advanced Science and Technology University of Illinois, 405 N. Mathews Ave, Urbana, IL 61801 Email: johns@ks.uiuc.edu Phone: 217-244-3349 WWW: http://www.ks.uiuc.edu/~johns/ Fax: 217-244-6078



    This archive was generated by hypermail 2b29 : Wed Dec 06 2000 - 08:27:00 CST