tuned STREAM on IBM System p5 505 (2CPUs 1.65 GHz)

From: Ly Vu (lyvu@us.ibm.com)
Date: Mon Oct 03 2005 - 13:50:44 CDT

  • Next message: Ly Vu: "standard STREAM on IBM System p5 520 (2CPUs 1.9 GHz)"

    These are tuned STREAM results on an IBM System p5 505
    with two 1.65 GHz cpus. This is a POWER5 SMP machine.
    Large pages were used in all cases.

    Function Rate (MB/s) RMS time Min time Max time
    Copy: 6248.96 .09 .09 .09
    Scale: 6136.57 .09 .09 .09
    Add: 8819.20 .09 .09 .09
    Triad: 9012.07 .09 .09 .09

    Here is the full output file:
    -----------------------------
     Requesting Large Pages
     Setting up for 2 CPUs per module
     Number of segments per array = 1
     CPU binding list : 0
     Shared Segment Pointer = 504403158265495552
     Shared Segment Pointer = 504403158533931008
     Shared Segment Pointer = 504403158802366464
     Segment Size (B) = 268435456 (MB = 256 )
     Array Size (B) = 268435456 (MB = 256 )
     Array Size (DW) = 33554432
     Num_threads = 2
     Num_threads = 2
     rebind: num_parthds is 2
     Starting Initialization
     Done With Initialization
     a(1) 1.00000000000000000
     b(M) 1.00000000000000000
     c(M) 1.00000000000000000

     Incremental Offset = 1536
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33537024
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 87531 microseconds
        (= 87531 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 6248.96 .09 .09 .09
    Scale: 6136.57 .09 .09 .09
    Add: 8819.20 .09 .09 .09
    Triad: 9012.07 .09 .09 .09
     Sum of a is = 50934355200000.0000
     Sum of b is = 10186871040000.0000
     Sum of c is = 13582494720000.0000

    ______________________________________________
    Ly Vu
    IBM Corp. - Austin, Texas.
    RS/6000 Performance Analysis.
    Phone : (512) 838-8228
    Email : lyvu@us.ibm.com



    This archive was generated by hypermail 2.1.4 : Mon Oct 03 2005 - 20:46:12 CDT