standard STREAM on IBM eServer p5 550 Express (1500 MHz, 4 cpu)

From: Frank Johnston (fjohn@us.ibm.com)
Date: Mon Oct 04 2004 - 16:14:55 CDT

  • Next message: Frank Johnston: "tuned STREAM on IBM eServer p5 550 Express (1500 MHz, 4 cpu)"

    These are standard STREAM results on a IBM eServer p5 550 Express
    with four 1500 MHz cpus (36MB L3 cache). This is a POWER5 SMP machine.
    Large pages were used in all cases.

    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6279.8475 .1709 .1709 .1710
    Scale: 6187.7380 .1735 .1734 .1735
    Add: 8225.0360 .1958 .1957 .1959
    Triad: 8414.6062 .1915 .1913 .1916

    Here is the full output file:
    ---------------------------------------------------

     Requesting Large Pages
     Setting up for 2 CPUs per module
     Number of segments per array = 2
     CPU binding list : 0 2
     Shared Segment Pointer = 504403158265495552
     Shared Segment Pointer = 504403158802366464
     Shared Segment Pointer = 504403159339237376
     Segment Size (B) = 268435456 (MB = 256 )
     Array Size (B) = 536870912 (MB = 512 )
     Array Size (DW) = 67108864
     Num_threads = 4
     Num_threads = 4
     Num_threads = 4
     Num_threads = 4
     rebind: num_parthds is 4
     Starting Initialization
     Done With Initialization
     a(1) 1.00000000000000000
     b(M) 1.00000000000000000
     c(M) 1.00000000000000000
     Incremental Offset = 512
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67079168
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6280.5902 .1709 .1709 .1710
    Scale: 6188.5582 .1735 .1734 .1735
    Add: 8231.0229 .1956 .1956 .1957
    Triad: 8388.4727 .1920 .1919 .1921
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 1536
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67079168
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6279.5652 .1710 .1709 .1711
    Scale: 6182.1416 .1737 .1736 .1737
    Add: 8242.3662 .1954 .1953 .1956
    Triad: 8356.1780 .1928 .1927 .1930
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 2560
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67079168
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6276.6233 .1710 .1710 .1711
    Scale: 6184.8766 .1736 .1735 .1737
    Add: 8249.3344 .1953 .1952 .1954
    Triad: 8371.8011 .1923 .1923 .1924
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 512
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67077120
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6278.9180 .1709 .1709 .1709
    Scale: 6184.4583 .1736 .1735 .1736
    Add: 8230.9923 .1957 .1956 .1958
    Triad: 8409.5182 .1918 .1914 .1923
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 1536
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67077120
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6277.8759 .1710 .1710 .1711
    Scale: 6183.7957 .1736 .1736 .1737
    Add: 8253.0551 .1952 .1951 .1954
    Triad: 8369.7399 .1924 .1923 .1924
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 2560
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67077120
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6276.0991 .1710 .1710 .1711
    Scale: 6183.1756 .1736 .1736 .1736
    Add: 8251.3506 .1952 .1951 .1954
    Triad: 8364.2967 .1925 .1925 .1926
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 512
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67075072
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6279.8475 .1709 .1709 .1710
    Scale: 6187.7380 .1735 .1734 .1735
    Add: 8225.0360 .1958 .1957 .1959
    Triad: 8414.6062 .1915 .1913 .1916
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 1536
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67075072
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6283.0469 .1711 .1708 .1712
    Scale: 6183.4285 .1736 .1736 .1736
    Add: 8242.7483 .1955 .1953 .1957
    Triad: 8367.6589 .1927 .1924 .1928
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 2560
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67075072
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6277.7105 .1711 .1710 .1712
    Scale: 6183.2501 .1736 .1736 .1737
    Add: 8249.2841 .1953 .1951 .1955
    Triad: 8375.0384 .1925 .1922 .1926
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 512
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67073024
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6277.5626 .1710 .1710 .1710
    Scale: 6185.5424 .1735 .1735 .1735
    Add: 8225.8370 .1957 .1957 .1958
    Triad: 8407.9575 .1920 .1915 .1923
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 1536
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67073024
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6275.2784 .1711 .1710 .1711
    Scale: 6184.2676 .1736 .1735 .1736
    Add: 8245.6675 .1953 .1952 .1953
    Triad: 8374.3152 .1923 .1922 .1924
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
     Incremental Offset = 2560
    ----------------------------------------------
     Double precision appears to have 16 digits of accuracy
     Assuming 8 bytes per DOUBLE PRECISION word
    ----------------------------------------------
     Array size = 67073024
     The total memory requirement is 1535 MB
     You are running each test 5 times
     --
     The *best* time for each test is used
     *EXCLUDING* the first and last iterations
     ----------------------------------------------------
     Your clock granularity appears to be less than one microsecond
     Your clock granularity/precision appears to be 1 microseconds
     ----------------------------------------------------
    Function Rate (MB/s) Avg time Min time Max time
    Copy: 6276.7135 .1710 .1710 .1711
    Scale: 6179.9713 .1737 .1737 .1737
    Add: 8246.3926 .1953 .1952 .1954
    Triad: 8367.2271 .1925 .1924 .1926
     ----------------------------------------------------
     Solution Validates!
     ----------------------------------------------------
    GETSHRSEG: requesting large pages
    GETSHRSEG ENTRY: shmgetflag -2147481216
    bindprocessor successful: thread_self() 794813 cpu_id 0
    bindprocessor successful: thread_self() 794813 cpu_id 2
    GETSHRSEG: requesting large pages
    GETSHRSEG ENTRY: shmgetflag -2147481216
    bindprocessor successful: thread_self() 794813 cpu_id 0
    bindprocessor successful: thread_self() 794813 cpu_id 2
    GETSHRSEG: requesting large pages
    GETSHRSEG ENTRY: shmgetflag -2147481216
    bindprocessor successful: thread_self() 794813 cpu_id 0
    bindprocessor successful: thread_self() 794813 cpu_id 2
    bindprocessor successful: thread_self() 786593 cpu_id 2
    bindprocessor successful: thread_self() 802987 cpu_id 3
    bindprocessor successful: thread_self() 794813 cpu_id 0
    bindprocessor successful: thread_self() 798889 cpu_id 1



    This archive was generated by hypermail 2.1.4 : Tue Oct 05 2004 - 07:49:32 CDT