STREAM Results for IBM S85

From: William R. Sullivan (wrs@wham.com)
Date: Tue Aug 07 2001 - 12:03:57 CDT

  • Next message: Stefan U. Hänßgen: "STREAM Results for PowerMac G4/867"

    John,
    I modified the STREAM benchmark in C to be done using pthreads so that I
    could run the same thing on Solaris and AIX. I would like to contribute
    the code back. I didn't have a fortran compiler and wasn't aware that
    the -qsmp option would allow me to use compiler directives to thread the
    kernels. However, my threading of the kernels is exactly in accordance
    with the operation of the parallel directives and the results I achieved
    running this benchmark on other systems that have already been reported
    on the site were similar.

    I am attaching the run log for an IBM p680 with 96Gb of memory and 24
    CPUs. I ran it in stepped sequence starting with one CPU and then
    incrementing to two then to 24 in steps of two. I am also attaching the
    code and Makefile for your consideration and if you would like to update
    the code base with this version, I have no objections. Of course it
    probably would have to be named differently since I didn't bother to
    keep the non-threaded functionality in this code. My compiler which is
    IBM C 3.6.4 didn't support the pragmas or at least I couldn't find any
    mention of them so I resorted to pthreads.

    There is one little bug in the code that I fixed and that is the
    reporting of total memory output which should use 3.0 as the multiplier
    instead of 3. I forgot to make that change before I ran this set of
    data and hence it quits reporting memory size correctly above 2.1Gb. I
    used 8000000 elements per CPU for this test because the S85 has a 16Mb
    L2 cache.

    Regards,
    William R. Sullivan
    CTO WHAM Engineering & Software

    512-345-9925 xt 110


    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 8000000, Offset = 0
    Total memory required = 183.1 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 262854 microseconds.
       (= 87618 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 312.9753 0.4211 0.4090 0.4222
    Scale: 304.6545 0.4218 0.4201 0.4228
    Add: 315.4009 0.6088 0.6087 0.6089
    Triad: 313.6486 0.6122 0.6121 0.6125
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 16000000, Offset = 0
    Total memory required = 366.2 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 284336 microseconds.
       (= 94778 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 603.7067 0.4289 0.4240 0.4380
    Scale: 603.2884 0.4293 0.4243 0.4394
    Add: 625.0509 0.6177 0.6143 0.6252
    Triad: 625.2045 0.6172 0.6142 0.6242
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 32000000, Offset = 0
    Total memory required = 732.4 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 278615 microseconds.
       (= 92871 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 1096.5431 0.4704 0.4669 0.4755
    Scale: 1162.0388 0.4442 0.4406 0.4491
    Add: 1163.5641 0.6656 0.6600 0.6725
    Triad: 1165.4427 0.6640 0.6590 0.6702
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 48000000, Offset = 0
    Total memory required = 1098.6 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 295103 microseconds.
       (= 98367 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 1649.1800 0.4704 0.4657 0.4939
    Scale: 1633.8547 0.4741 0.4701 0.4890
    Add: 1722.6529 0.6747 0.6687 0.6973
    Triad: 1733.4543 0.6707 0.6646 0.6937
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 64000000, Offset = 0
    Total memory required = 1464.8 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 296449 microseconds.
       (= 98816 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 2205.9456 0.4717 0.4642 0.4814
    Scale: 2037.3124 0.5108 0.5026 0.5147
    Add: 2101.2111 0.7397 0.7310 0.7436
    Triad: 2112.6295 0.7356 0.7271 0.7401
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 80000000, Offset = 0
    Total memory required = 1831.1 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 296009 microseconds.
       (= 98669 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 2675.5400 0.4791 0.4784 0.4865
    Scale: 2645.1091 0.4842 0.4839 0.4845
    Add: 2776.5365 0.6917 0.6915 0.6920
    Triad: 2794.9188 0.6873 0.6870 0.6875
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 96000000, Offset = 0
    Total memory required = -1898.7 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 315271 microseconds.
       (= 105090 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 2815.2951 0.5467 0.5456 0.5588
    Scale: 2910.5629 0.5282 0.5277 0.5286
    Add: 2988.3461 0.7717 0.7710 0.7723
    Triad: 3004.2482 0.7677 0.7669 0.7684
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 112000000, Offset = 0
    Total memory required = -1532.5 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 323757 microseconds.
       (= 107919 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3484.6595 0.5256 0.5143 0.5481
    Scale: 3439.0371 0.5313 0.5211 0.5419
    Add: 3648.4460 0.7493 0.7368 0.7615
    Triad: 3669.3740 0.7451 0.7326 0.7574
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 128000000, Offset = 0
    Total memory required = -1166.3 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 327443 microseconds.
       (= 109147 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 3688.6017 0.5700 0.5552 0.5878
    Scale: 3773.1053 0.5574 0.5428 0.5609
    Add: 3933.9626 0.7982 0.7809 0.8017
    Triad: 3954.1159 0.7942 0.7769 0.7978
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 144000000, Offset = 0
    Total memory required = -800.1 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 342617 microseconds.
       (= 114205 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 4185.0502 0.5583 0.5505 0.5882
    Scale: 4121.6975 0.5655 0.5590 0.5743
    Add: 4448.7128 0.7845 0.7769 0.7974
    Triad: 4482.8741 0.7785 0.7709 0.7918
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 160000000, Offset = 0
    Total memory required = -433.9 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 366235 microseconds.
       (= 122078 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 4229.1607 0.6070 0.6053 0.6298
    Scale: 4297.3644 0.5968 0.5957 0.5975
    Add: 4581.6340 0.8399 0.8381 0.8409
    Triad: 4613.3493 0.8328 0.8324 0.8332
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 176000000, Offset = 0
    Total memory required = -67.7 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 383044 microseconds.
       (= 127681 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 4596.9430 0.6150 0.6126 0.6453
    Scale: 4515.8818 0.6243 0.6236 0.6252
    Add: 4951.7604 0.8544 0.8530 0.8562
    Triad: 5004.1825 0.8456 0.8441 0.8467
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 192000000, Offset = 0
    Total memory required = 298.5 MB.
    Each test is run 20 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 3 microseconds.
    Each test below will take on the order of 799821 microseconds.
       (= 266607 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 4726.2996 0.6584 0.6500 0.6878
    Scale: 4556.9921 0.6797 0.6741 0.6800
    Add: 5008.7225 0.9277 0.9200 0.9300
    Triad: 5036.2141 0.9194 0.9150 0.9200





    This archive was generated by hypermail 2b29 : Wed Oct 31 2001 - 11:26:46 CST