tuned STREAM on IBM eServer p5 520 Express (1500 MHz, 2 cpu)

From: Frank Johnston (fjohn@us.ibm.com)
Date: Mon Oct 04 2004 - 16:18:43 CDT

  • Next message: Frank Johnston: "standard STREAM on IBM eServer p5 550 Express (1500 MHz, 4 cpu)"

    These are tuned STREAM results on a IBM eServer p5 520 Express
    with two 1500 MHz cpus (36MB L3 cache). This is a POWER5 SMP machine.
    Large pages were used in all cases.

    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3721.04 .15 .14 .16
    Scale: 3632.69 .15 .15 .15
    Add: 5252.87 .15 .15 .15
    Triad: 5359.51 .15 .15 .15

    Here is the full output file:
    ---------------------------------------------------
     Requesting Large Pages
     Setting up for 2 CPUs per module
     Number of segments per array = 1
     CPU binding list : 0
     Shared Segment Pointer = 504403158265495552
     Shared Segment Pointer = 504403158533931008
     Shared Segment Pointer = 504403158802366464
     Segment Size (B) = 268435456 (MB = 256 )
     Array Size (B) = 268435456 (MB = 256 )
     Array Size (DW) = 33554432
     Num_threads = 2
     Num_threads = 2
     rebind: num_parthds is 2
     Starting Initialization
     Done With Initialization
     a(1) 1.00000000000000000
     b(M) 1.00000000000000000
     c(M) 1.00000000000000000
     Incremental Offset = 512
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33541120
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147644 microseconds
        (= 147644 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3721.34 .15 .14 .16
    Scale: 3635.18 .15 .15 .15
    Add: 5238.18 .15 .15 .15
    Triad: 5351.20 .15 .15 .15
     Sum of a is = 50940501581250.0000
     Sum of b is = 10188100316250.0000
     Sum of c is = 13584133755000.0000
     Incremental Offset = 1536
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33541120
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147532 microseconds
        (= 147532 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3721.58 .15 .14 .16
    Scale: 3638.01 .15 .15 .15
    Add: 5224.91 .15 .15 .15
    Triad: 5343.64 .15 .15 .15
     Sum of a is = 50940501581250.0000
     Sum of b is = 10188100316250.0000
     Sum of c is = 13584133755000.0000
     Incremental Offset = 2560
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33541120
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147470 microseconds
        (= 147470 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3721.17 .15 .14 .16
    Scale: 3633.30 .15 .15 .15
    Add: 5252.42 .15 .15 .15
    Triad: 5360.53 .15 .15 .15
     Sum of a is = 50940501581250.0000
     Sum of b is = 10188100316250.0000
     Sum of c is = 13584133755000.0000
     Incremental Offset = 512
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33539072
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147671 microseconds
        (= 147671 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3721.45 .15 .14 .16
    Scale: 3633.24 .15 .15 .15
    Add: 5256.05 .15 .15 .15
    Triad: 5346.22 .15 .15 .15
     Sum of a is = 50937391181250.0000
     Sum of b is = 10187478236250.0000
     Sum of c is = 13583304315000.0000
     Incremental Offset = 1536
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33539072
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147570 microseconds
        (= 147570 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3720.40 .15 .14 .16
    Scale: 3634.51 .15 .15 .15
    Add: 5248.29 .15 .15 .15
    Triad: 5350.85 .15 .15 .15
     Sum of a is = 50937391181250.0000
     Sum of b is = 10187478236250.0000
     Sum of c is = 13583304315000.0000
     Incremental Offset = 2560
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33539072
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147591 microseconds
        (= 147591 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3721.04 .15 .14 .16
    Scale: 3632.69 .15 .15 .15
    Add: 5252.87 .15 .15 .15
    Triad: 5359.51 .15 .15 .15
     Sum of a is = 50937391181250.0000
     Sum of b is = 10187478236250.0000
     Sum of c is = 13583304315000.0000
     Incremental Offset = 512
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33537024
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147622 microseconds
        (= 147622 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3722.54 .15 .14 .16
    Scale: 3637.51 .15 .15 .15
    Add: 5227.95 .15 .15 .16
    Triad: 5353.06 .15 .15 .15
     Sum of a is = 50934280781250.0000
     Sum of b is = 10186856156250.0000
     Sum of c is = 13582474875000.0000
     Incremental Offset = 1536
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33537024
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147660 microseconds
        (= 147660 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3718.99 .15 .14 .16
    Scale: 3634.71 .15 .15 .15
    Add: 5230.82 .15 .15 .15
    Triad: 5336.16 .15 .15 .15
     Sum of a is = 50934280781250.0000
     Sum of b is = 10186856156250.0000
     Sum of c is = 13582474875000.0000
     Incremental Offset = 2560
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33537024
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147555 microseconds
        (= 147555 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3721.74 .15 .14 .16
    Scale: 3633.74 .15 .15 .15
    Add: 5258.08 .15 .15 .15
    Triad: 5356.05 .15 .15 .15
     Sum of a is = 50934280781250.0000
     Sum of b is = 10186856156250.0000
     Sum of c is = 13582474875000.0000
     Incremental Offset = 512
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33534976
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147649 microseconds
        (= 147649 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3722.11 .15 .14 .16
    Scale: 3633.78 .15 .15 .15
    Add: 5262.85 .15 .15 .15
    Triad: 5355.36 .15 .15 .15
     Sum of a is = 50931170381250.0000
     Sum of b is = 10186234076250.0000
     Sum of c is = 13581645435000.0000
     Incremental Offset = 1536
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33534976
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147545 microseconds
        (= 147545 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3720.23 .15 .14 .16
    Scale: 3634.44 .15 .15 .15
    Add: 5227.36 .15 .15 .15
    Triad: 5341.53 .15 .15 .15
     Sum of a is = 50931170381250.0000
     Sum of b is = 10186234076250.0000
     Sum of c is = 13581645435000.0000
     Incremental Offset = 2560
     Number of Threads = 2
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 33534976
     Offset = 0
     The total memory requirement is 767 MB
     You are running each test 5 times
     The *best* time for each test is used
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     The tests below will each take a time on the order
     of 147579 microseconds
        (= 147579 clock ticks)
     Increase the size of the arrays if this shows that
     you are not getting at least 20 clock ticks per test.
     ----------------------------------------------------
     WARNING -- The above is only a rough guideline.
     For best results, please be sure you know the
     precision of your system timer.
     ----------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3721.60 .15 .14 .16
    Scale: 3633.22 .15 .15 .15
    Add: 5251.38 .15 .15 .15
    Triad: 5341.64 .15 .15 .15
     Sum of a is = 50931170381250.0000
     Sum of b is = 10186234076250.0000
     Sum of c is = 13581645435000.0000
    GETSHRSEG: requesting large pages
    GETSHRSEG ENTRY: shmgetflag -2147481216
    bindprocessor successful: thread_self() 295117 cpu_id 0
    GETSHRSEG: requesting large pages
    GETSHRSEG ENTRY: shmgetflag -2147481216
    bindprocessor successful: thread_self() 295117 cpu_id 0
    GETSHRSEG: requesting large pages
    GETSHRSEG ENTRY: shmgetflag -2147481216
    bindprocessor successful: thread_self() 295117 cpu_id 0
    bindprocessor successful: thread_self() 295117 cpu_id 0
    bindprocessor successful: thread_self() 475217 cpu_id 1



    This archive was generated by hypermail 2.1.4 : Tue Oct 05 2004 - 07:49:28 CDT