SGI power chal, additional results

From: Hong Huang (honhuang@csd.uwm.edu)
Date: Thu Oct 26 1995 - 19:37:51 CDT


These are additional timing, because deviation is so large in the multi cpu
case. I want to show that these performance number are so inconsistant !!!

SGI power challenger 90 Mhz, 16 cpus
f77 -O3 -mp -pfa
cpu = 8 ( setenv MP_SET_NUMTHREADS 8 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 290634 microseconds
    (= 29063 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 519.4386 0.0641 0.0616 0.0673
Scaling : 593.5371 0.0610 0.0539 0.0671
Summing : 618.9075 0.0896 0.0776 0.0955
SAXPYing : 562.2910 0.0886 0.0854 0.0921
 Sum of a is : 2.3066015624965002E+18
 Sum of b is : 4.6132031249849914E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa
cpu = 10
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 11 microseconds
 The tests below will each take a time on the order
 of 208931 microseconds
    (= 18994 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 711.8076 0.0505 0.0450 0.0567
Scaling : 779.8958 0.0509 0.0410 0.0595
Summing : 757.8027 0.0752 0.0633 0.0844
SAXPYing : 721.8051 0.0732 0.0665 0.0804
 Sum of a is : 2.3066015624966252E+18
 Sum of b is : 4.6132031249912390E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa
cpu = 10
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 182706 microseconds
    (= 18271 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 707.1348 0.0528 0.0453 0.0575
Scaling : 716.2041 0.0538 0.0447 0.0595
Summing : 736.2411 0.0738 0.0652 0.0880
SAXPYing : 760.3973 0.0723 0.0631 0.0818
 Sum of a is : 2.3066015624966252E+18
 Sum of b is : 4.6132031249912390E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa
cpu = 10
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 11 microseconds
 The tests below will each take a time on the order
 of 41486 microseconds
    (= 3771 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 798.3828 0.0477 0.0401 0.0541
Scaling : 720.1546 0.0537 0.0444 0.0617
Summing : 784.6084 0.0798 0.0612 0.0868
SAXPYing : 739.1794 0.0743 0.0649 0.0831
 Sum of a is : 2.3066015624966252E+18
 Sum of b is : 4.6132031249912390E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa
cpu = 12
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of -328379 microseconds
    (= -32838 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 781.0213 0.0541 0.0410 0.0769
Scaling : 830.9512 0.0505 0.0385 0.0588
Summing : 767.2858 0.0745 0.0626 0.0899
SAXPYing : 681.4789 0.0777 0.0704 0.0907
 Sum of a is : 2.3066015624967496E+18
 Sum of b is : 4.6132031249974874E+17
 Sum of c is : 6.1509375000000013E+17

f77 -O3 -mp -pfa
cpu = 12
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 11 microseconds
 The tests below will each take a time on the order
 of 312666 microseconds
    (= 28424 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 880.4973 0.0465 0.0363 0.0610
Scaling : 898.4746 0.0566 0.0356 0.0705
Summing : 750.6332 0.0796 0.0639 0.1010
SAXPYing : 705.9855 0.0765 0.0680 0.0848
 Sum of a is : 2.3066015624967496E+18
 Sum of b is : 4.6132031249974874E+17
 Sum of c is : 6.1509375000000013E+17

f77 -O3 -mp -pfa
cpu = 12
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 11 microseconds
 The tests below will each take a time on the order
 of 149912 microseconds
    (= 13628 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 721.6162 0.0552 0.0443 0.0755
Scaling : 757.1458 0.0559 0.0423 0.0657
Summing : 712.1009 0.0775 0.0674 0.0987
SAXPYing : 700.4223 0.0767 0.0685 0.0860
 Sum of a is : 2.3066015624967496E+18
 Sum of b is : 4.6132031249974880E+17
 Sum of c is : 6.1509375000000013E+17

f77 -O3 -mp -pfa
cpu = 14
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 55775 microseconds
    (= 5578 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 734.6506 0.0517 0.0436 0.0637
Scaling : 805.3966 0.0530 0.0397 0.0751
Summing : 1410.8924 0.0743 0.0340 0.1033
SAXPYing : 800.4413 0.0689 0.0600 0.0785
 Sum of a is : 2.3066015624968750E+18
 Sum of b is : 4.6132031250000006E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa
cpu = 14
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 99496 microseconds
    (= 9950 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 908.8108 0.0575 0.0352 0.0864
Scaling : 1040.0404 0.0512 0.0308 0.0586
Summing : 1124.8600 0.0669 0.0427 0.0986
SAXPYing : 1294.0099 0.0662 0.0371 0.0810
 Sum of a is : 2.3066015624968750E+18
 Sum of b is : 4.6132031250000006E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa
cpu = 14
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 33234 microseconds
    (= 3323 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 971.8141 0.0492 0.0329 0.0597
Scaling : 880.5753 0.0486 0.0363 0.0599
Summing : 928.8767 0.0684 0.0517 0.0792
SAXPYing : 891.1332 0.0679 0.0539 0.0953
 Sum of a is : 2.3066015624968750E+18
 Sum of b is : 4.6132031250000006E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa
cpu = 16
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 68401 microseconds
    (= 6840 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 1211.4771 0.0670 0.0264 0.0997
Scaling : 867.3702 0.0619 0.0369 0.1115
Summing : 768.7126 0.0931 0.0624 0.1500
SAXPYing : 1012.3808 0.0769 0.0474 0.1227
 Sum of a is : 2.3066015624969999E+18
 Sum of b is : 4.6132031249999981E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa
cpu = 16
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 190965 microseconds
    (= 19096 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 651.3479 0.0670 0.0491 0.0844
Scaling : 1211.7560 0.0595 0.0264 0.0977
Summing : 763.9550 0.0870 0.0628 0.1083
SAXPYing : 729.3068 0.0828 0.0658 0.1031
 Sum of a is : 2.3066015624969999E+18
 Sum of b is : 4.6132031249999981E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa
cpu = 16
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of -106931 microseconds
    (= -10693 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 789.5787 0.0618 0.0405 0.0830
Scaling : 1012.7882 0.0624 0.0316 0.0858
Summing : 1062.5104 0.0984 0.0452 0.1249
SAXPYing : 902.9880 0.0961 0.0532 0.1373
 Sum of a is : 2.3066015624969999E+18
 Sum of b is : 4.6132031249999981E+17
 Sum of c is : 6.1509375000000000E+17



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:05 CDT