Fw: Tuned STREAM on IBM eServer p5 550 (1500 MHz, 4cpu)

From: John D Mccalpin (mccalpin@us.ibm.com)
Date: Mon Mar 21 2005 - 07:45:46 CST

  • Next message: Juan Uys: "Result of Stream by John D. McCalpin"

    ---
    John D. McCalpin, Ph.D.
    IBM - 11400 Burnet Road, MS 045-3N098
    Austin, TX  78758
    (512) 838-6167 or IBM tie line 678/6167
    

    ----- Forwarded by John D Mccalpin/Austin/IBM on 03/21/2005 07:45 AM ----- Ly Vu/Austin/IBM 03/11/2005 01:44 To PM mccalpin@us.ibm.com cc jacobt@us.ibm.com, robichau@us.ibm.com Subject Tuned STREAM on IBM eServer p5 550 (1500 MHz, 4cpu)

    These are tuned STREAM results on an IBM eServer p5 550 express with four 1500 MHz cpus. This is a POWER5 SMP machine. Large pages were used in all cases.

    Function Rate (MB/s) RMS time Min time Max time Copy: 6264.39 .17 .17 .19 Scale: 6137.41 .17 .17 .17 Add: 8775.14 .18 .18 .18 Triad: 8997.68 .18 .18 .18

    Here is the full output file: ----------------------------- Requesting Large Pages Setting up for 2 CPUs per module Number of segments per array = 2 CPU binding list : 0 2 Shared Segment Pointer = 504403158265495552 Shared Segment Pointer = 504403158802366464 Shared Segment Pointer = 504403159339237376 Segment Size (B) = 268435456 (MB = 256 ) Array Size (B) = 536870912 (MB = 512 ) Array Size (DW) = 67108864 Num_threads = 4 Num_threads = 4 Num_threads = 4 Num_threads = 4 rebind: num_parthds is 4 Starting Initialization Done With Initialization a(1) 1.00000000000000000 b(M) 1.00000000000000000 c(M) 1.00000000000000000 Incremental Offset = 2560 Number of Threads = 4 ---------------------------------------------- Double precision appears to have 16 digits of accuracy Assuming 8 bytes per DOUBLE PRECISION word ---------------------------------------------- Array size = 67077120 Offset = 0 The total memory requirement is 1535 MB You are running each test 5 times The *best* time for each test is used ---------------------------------------------------- Your clock granularity appears to be less than one microsecond Your clock granularity/precision appears to be 1 microseconds The tests below will each take a time on the order of 174749 microseconds (= 174749 clock ticks) Increase the size of the arrays if this shows that you are not getting at least 20 clock ticks per test. ---------------------------------------------------- WARNING -- The above is only a rough guideline. For best results, please be sure you know the precision of your system timer. ---------------------------------------------------- Function Rate (MB/s) RMS time Min time Max time Copy: 6264.39 .17 .17 .19 Scale: 6137.41 .17 .17 .17 Add: 8775.14 .18 .18 .18 Triad: 8997.68 .18 .18 .18 Sum of a is = 101873152743750.000 Sum of b is = 20374630548750.0000 Sum of c is = 27166174065000.0000

    ______________________________________________ Ly Vu IBM Corp. - Austin, Texas. RS/6000 Performance Analysis. Phone : (512) 838-8228 Email : lyvu@us.ibm.com


    pic06918.gif
    ecblank.gif

    This archive was generated by hypermail 2.1.4 : Mon Jun 13 2005 - 08:57:56 CDT