standard STREAM on IBM eServer p5 595 (1900 MHz, 64 cpu)

From: Frank Johnston (fjohn@us.ibm.com)
Date: Tue Nov 02 2004 - 09:58:07 CST

  • Next message: Frank Johnston: "tuned STREAM on IBM eServer p5 595 (1900 MHz, 64 cpu)"

    These are standard STREAM results on an IBM eServer p5 595
    with sixty-four 1.9GHz cpus (36MB L3 cache). This is a POWER5 SMP machine.
    Large pages were used in all cases.

    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157559.59 .03 .03 .03
    Scale: 152770.74 .03 .03 .03
    Add: 168973.99 .04 .04 .04
    Triad: 173564.19 .04 .04 .04

    Here is the full output file:
    --------------------------------------------------
     Requesting Large Pages
     Setting up for 8 CPUs per module
     Number of segments per array = 8
     CPU binding list : 0 8 16 24 32 40 48 56
     Shared Segment Pointer = 504403158265495552
     Shared Segment Pointer = 504403160412979200
     Shared Segment Pointer = 504403162560462848
     Segment Size (B) = 268435456 (MB = 256 )
     Array Size (B) = 2147483648 (MB = 2048 )
     Array Size (DW) = 268435456
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     Num_threads = 64
     rebind: num_parthds is 64
    GETSHRSEG: requesting large pages
    GETSHRSEG ENTRY: shmgetflag -2147481216
    bindprocessor successful: thread_self() 2437303 cpu_id 0
    bindprocessor successful: thread_self() 2437303 cpu_id 8
    bindprocessor successful: thread_self() 2437303 cpu_id 16
    bindprocessor successful: thread_self() 2437303 cpu_id 24
    bindprocessor successful: thread_self() 2437303 cpu_id 32
    bindprocessor successful: thread_self() 2437303 cpu_id 40
    bindprocessor successful: thread_self() 2437303 cpu_id 48
    bindprocessor successful: thread_self() 2437303 cpu_id 56
    GETSHRSEG: requesting large pages
    GETSHRSEG ENTRY: shmgetflag -2147481216
    bindprocessor successful: thread_self() 2437303 cpu_id 0
    bindprocessor successful: thread_self() 2437303 cpu_id 8
    bindprocessor successful: thread_self() 2437303 cpu_id 16
    bindprocessor successful: thread_self() 2437303 cpu_id 24
    bindprocessor successful: thread_self() 2437303 cpu_id 32
    bindprocessor successful: thread_self() 2437303 cpu_id 40
    bindprocessor successful: thread_self() 2437303 cpu_id 48
    bindprocessor successful: thread_self() 2437303 cpu_id 56
    GETSHRSEG: requesting large pages
    GETSHRSEG ENTRY: shmgetflag -2147481216
    bindprocessor successful: thread_self() 2437303 cpu_id 0
    bindprocessor successful: thread_self() 2437303 cpu_id 8
    bindprocessor successful: thread_self() 2437303 cpu_id 16
    bindprocessor successful: thread_self() 2437303 cpu_id 24
    bindprocessor successful: thread_self() 2437303 cpu_id 32
    bindprocessor successful: thread_self() 2437303 cpu_id 40
    bindprocessor successful: thread_self() 2437303 cpu_id 48
    bindprocessor successful: thread_self() 2437303 cpu_id 56
    bindprocessor successful: thread_self() 2453683 cpu_id 1
    bindprocessor successful: thread_self() 2552031 cpu_id 25
    bindprocessor successful: thread_self() 2621441 cpu_id 42
    bindprocessor successful: thread_self() 2625539 cpu_id 43
    bindprocessor successful: thread_self() 2588913 cpu_id 34
    bindprocessor successful: thread_self() 2474169 cpu_id 6
    bindprocessor successful: thread_self() 2478267 cpu_id 7
    bindprocessor successful: thread_self() 2543835 cpu_id 23
    bindprocessor successful: thread_self() 2539737 cpu_id 22
    bindprocessor successful: thread_self() 2494659 cpu_id 11
    bindprocessor successful: thread_self() 2490561 cpu_id 10
    bindprocessor successful: thread_self() 2617599 cpu_id 41
    bindprocessor successful: thread_self() 2666519 cpu_id 53
    bindprocessor successful: thread_self() 2605305 cpu_id 38
    bindprocessor successful: thread_self() 2646029 cpu_id 48
    bindprocessor successful: thread_self() 2650127 cpu_id 49
    bindprocessor successful: thread_self() 2531541 cpu_id 20
    bindprocessor successful: thread_self() 2564325 cpu_id 28
    bindprocessor successful: thread_self() 2506953 cpu_id 14
    bindprocessor successful: thread_self() 2629637 cpu_id 44
    bindprocessor successful: thread_self() 2633735 cpu_id 45
    bindprocessor successful: thread_self() 2482365 cpu_id 8
    bindprocessor successful: thread_self() 2691107 cpu_id 59
    bindprocessor successful: thread_self() 2597109 cpu_id 36
    bindprocessor successful: thread_self() 2593011 cpu_id 35
    bindprocessor successful: thread_self() 2511051 cpu_id 15
    bindprocessor successful: thread_self() 2470071 cpu_id 5
    bindprocessor successful: thread_self() 2461875 cpu_id 3
    bindprocessor successful: thread_self() 2355365 cpu_id 2
    bindprocessor successful: thread_self() 2601207 cpu_id 37
    bindprocessor successful: thread_self() 2674715 cpu_id 55
    bindprocessor successful: thread_self() 2670617 cpu_id 54
    bindprocessor successful: thread_self() 2658323 cpu_id 51
    bindprocessor successful: thread_self() 2654225 cpu_id 50
    bindprocessor successful: thread_self() 2707499 cpu_id 63
    bindprocessor successful: thread_self() 2703401 cpu_id 62
    bindprocessor successful: thread_self() 2547933 cpu_id 24
    bindprocessor successful: thread_self() 2560227 cpu_id 27
    bindprocessor successful: thread_self() 2556129 cpu_id 26
    bindprocessor successful: thread_self() 2486463 cpu_id 9
    bindprocessor successful: thread_self() 2662421 cpu_id 52
    bindprocessor successful: thread_self() 2613501 cpu_id 40
    bindprocessor successful: thread_self() 2568423 cpu_id 29
    bi Starting Initialization
     Done With Initialization
     a(1) 1.00000000000000000
     b(M) 1.00000000000000000
     c(M) 1.00000000000000000
     Incremental Offset = 512
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267914240
     The total memory requirement is 6132 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157643.69 .03 .03 .03
    Scale: 152756.33 .03 .03 .03
    Add: 169093.94 .04 .04 .04
    Triad: 173423.77 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 1536
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267914240
     The total memory requirement is 6132 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157490.41 .03 .03 .03
    Scale: 152957.76 .03 .03 .03
    Add: 168821.90 .04 .04 .04
    Triad: 173614.68 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 2560
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267914240
     The total memory requirement is 6132 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157279.62 .03 .03 .03
    Scale: 152946.05 .03 .03 .03
    Add: 169304.12 .04 .04 .04
    Triad: 173239.96 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 512
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267912192
     The total memory requirement is 6132 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157863.95 .03 .03 .03
    Scale: 153422.56 .03 .03 .03
    Add: 169881.92 .04 .04 .04
    Triad: 173065.21 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 1536
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267912192
     The total memory requirement is 6132 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157475.41 .03 .03 .03
    Scale: 152927.97 .03 .03 .03
    Add: 168642.21 .04 .04 .04
    Triad: 173036.34 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 2560
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267912192
     The total memory requirement is 6132 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157559.59 .03 .03 .03
    Scale: 152770.74 .03 .03 .03
    Add: 168973.99 .04 .04 .04
    Triad: 173564.19 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 512
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267910144
     The total memory requirement is 6131 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 158240.66 .03 .03 .03
    Scale: 153067.41 .03 .03 .03
    Add: 168911.31 .04 .04 .04
    Triad: 173457.93 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 1536
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267910144
     The total memory requirement is 6131 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157879.38 .03 .03 .03
    Scale: 152660.61 .03 .03 .03
    Add: 168630.37 .04 .04 .04
    Triad: 172790.00 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 2560
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267910144
     The total memory requirement is 6131 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157017.62 .03 .03 .03
    Scale: 152586.76 .03 .03 .03
    Add: 168704.21 .04 .04 .04
    Triad: 173050.56 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 512
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267908096
     The total memory requirement is 6131 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157561.33 .03 .03 .03
    Scale: 153213.64 .03 .03 .03
    Add: 168684.98 .04 .04 .04
    Triad: 173024.81 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 1536
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267908096
     The total memory requirement is 6131 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 157237.50 .03 .03 .03
    Scale: 152722.99 .03 .03 .03
    Add: 168850.80 .04 .04 .04
    Triad: 173624.11 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 2560
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 267908096
     The total memory requirement is 6131 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 156589.74 .03 .03 .03
    Scale: 152212.26 .03 .03 .03
    Add: 168943.88 .04 .04 .04
    Triad: 173346.22 .04 .04 .04
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
    ndprocessor successful: thread_self() 2687009 cpu_id 58
    bindprocessor successful: thread_self() 2527443 cpu_id 19
    bindprocessor successful: thread_self() 2523345 cpu_id 18
    bindprocessor successful: thread_self() 2535639 cpu_id 21
    bindprocessor successful: thread_self() 2437303 cpu_id 0
    bindprocessor successful: thread_self() 2609403 cpu_id 39
    bindprocessor successful: thread_self() 2465973 cpu_id 4
    bindprocessor successful: thread_self() 2498757 cpu_id 12
    bindprocessor successful: thread_self() 2502855 cpu_id 13
    bindprocessor successful: thread_self() 2584815 cpu_id 33
    bindprocessor successful: thread_self() 2580717 cpu_id 32
    bindprocessor successful: thread_self() 2695205 cpu_id 60
    bindprocessor successful: thread_self() 2699303 cpu_id 61
    bindprocessor successful: thread_self() 2515149 cpu_id 16
    bindprocessor successful: thread_self() 2519247 cpu_id 17
    bindprocessor successful: thread_self() 2682911 cpu_id 57
    bindprocessor successful: thread_self() 2678813 cpu_id 56
    bindprocessor successful: thread_self() 2572521 cpu_id 30
    bindprocessor successful: thread_self() 2576619 cpu_id 31
    bindprocessor successful: thread_self() 2641931 cpu_id 47
    bindprocessor successful: thread_self() 2637833 cpu_id 46



    This archive was generated by hypermail 2.1.4 : Wed Nov 03 2004 - 08:05:54 CST