New STREAM Results on p690

From: John D Mccalpin (mccalpin@us.ibm.com)
Date: Thu Apr 11 2002 - 20:13:20 CDT

  • Next message: Christmann, Mark: "STREAM results submission for DS20L"

    Hi John,

    These STREAM results have been approved for publication.

    They were obtained on IBM eServer pSeries 690 HPC and Turbo
    systems, respectively, running the version of AIX internally
    called AIX 5.1D, which starts shipping in April, 2002. Each
    system was configured with 128 GB RAM, consisting of eight 16
    GB memory features.

    These are "standard" results, with no code modifications.
    The arrays were sized to each be at least 4x the size of the
    512 MB of L3 cache in each system, so N=256,000,000.

    These tests were run with memory affinity enabled, but did not
    use large pages. We feel that these small page results better
    represent the most common customer application environment.
    Both large pages and code modifications can improve performance
    in many (but not all) situations. The most effective code
    modification involves the insertion of DCBZ instructions
    to allocate the target array in the cache without reading it
    from memory. An example of how to get the XLF 7 compiler to
    do this is already on the STREAM web site, in the directory:
          ftp://ftp.cs.virginia.edu/pub/stream/Code/Contrib/POWER4/

    If there is sufficient customer demand, we will consider
    publishing the "tuned" numbers using large pages and/or code
    modifications. However, we believe that the numbers submitted
    here establish the capability of the system quite clearly.

    Summary of Improvements:

    STANDARD 16p Published New Delta
    Copy 17394 20267 14%
    Scale 17066 20265 16%
    Add 19676 24706 20%
    Triad 20051 25058 20%

    STANDARD 32p Published New Delta
    Copy 22421 28611 22%
    Scale 21411 28994 26%
    Add 24830 32222 23%
    Triad 25501 32249 21%

    Detailed Results:

    p690 HPC
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 256000000
     Offset = 256
     The total memory requirement is 5859 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     rebind: num_parthds is 16
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 20266.5212 .2023 .2021 .2024
    Scale: 20264.7402 .2023 .2021 .2024
    Add: 24705.5623 .2488 .2487 .2489
    Triad: 25058.2375 .2454 .2452 .2456
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    p690 Turbo
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 256000000
     Offset = 512
     The total memory requirement is 5859 MB
     You are running each test 10 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 28610.8942 .1434 .1432 .1438
    Scale: 28993.8226 .1415 .1413 .1416
    Add: 32222.4249 .1909 .1907 .1911
    Triad: 32248.5949 .1908 .1905 .1910
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------

    Sincerely,
    Your evil twin brother....

    ---
    John D. McCalpin, Ph.D.          STSM, eServer Hardware Performance
    IBM - 11400 Burnet Road, MS 045-3N098             Austin, TX  78758
    (512)838-6167 or tie line 678/6167   FAX (512)838-6486  or 678/6486
    


    This archive was generated by hypermail 2.1.4 : Fri Apr 12 2002 - 07:11:35 CDT