SGI power chal.

From: Hong Huang (honhuang@csd.uwm.edu)
Date: Thu Oct 26 1995 - 19:37:10 CDT


SGI power chalenger: 90 Mhz 16 cpu.
f77 -O3 ( 1 cpu used )

 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 11 microseconds
 The tests below will each take a time on the order
 of 133154 microseconds
    (= 12105 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 142.1136 0.2256 0.2252 0.2277
Scaling : 141.8861 0.2262 0.2255 0.2284
Summing : 144.3148 0.3353 0.3326 0.3375
SAXPYing : 142.2117 0.3383 0.3375 0.3411
 Sum of a is : 2.3066015625600835E+18
 Sum of b is : 4.6132031250731405E+17
 Sum of c is : 6.1509375001225114E+17

f77 -O3 -mp -pfa (mp=multi--proc. pfa=power fortran accelorator, KAP stuff)
2 cpu used ( setenv MP_SET_NUMTHREADS 2 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 81864 microseconds
    (= 8186 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 254.7126 0.1358 0.1256 0.1656
Scaling : 257.7796 0.1367 0.1241 0.1498
Summing : 266.4672 0.1935 0.1801 0.2198
SAXPYing : 251.5658 0.1971 0.1908 0.2149
 Sum of a is : 2.3066015624961674E+18
 Sum of b is : 4.6132031250262810E+17
 Sum of c is : 6.1509375000850240E+17

f77 -O3 -mp -pfa ( mp=multi--proc. pfa=power fortran accelorator)
4 cpu used ( setenv MP_SET_NUMTHREADS 4 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 349062 microseconds
    (= 34906 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 447.9157 0.0775 0.0714 0.0938
Scaling : 418.2137 0.0854 0.0765 0.1094
Summing : 437.3290 0.1155 0.1098 0.1255
SAXPYing : 497.8021 0.1193 0.0964 0.1413
 Sum of a is : 2.3066015624962504E+18
 Sum of b is : 4.6132031249724960E+17
 Sum of c is : 6.1509375000100480E+17

f77 -O3 -mp -pfa ( mp=multi--proc. pfa=power fortran accelorator)
6 cpu used ( setenv MP_SET_NUMTHREADS 6 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 36968 microseconds
    (= 3697 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 541.3075 0.0674 0.0591 0.0851
Scaling : 589.1238 0.0715 0.0543 0.0981
Summing : 569.7460 0.0945 0.0842 0.1094
SAXPYing : 555.6197 0.0935 0.0864 0.1092
 Sum of a is : 2.3066015624963753E+18
 Sum of b is : 4.6132031249787443E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa ( mp=multi--proc. pfa=power fortran accelorator)
8 cpu used ( setenv MP_SET_NUMTHREADS 8 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 58374 microseconds
    (= 5837 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 604.1149 0.0624 0.0530 0.0773
Scaling : 685.9603 0.0671 0.0466 0.0934
Summing : 665.9354 0.0936 0.0721 0.1387
SAXPYing : 757.9925 0.0999 0.0633 0.1969
 Sum of a is : 2.3066015624965002E+18
 Sum of b is : 4.6132031249849914E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa ( mp=multi--proc. pfa=power fortran accelorator)
10 cpu used ( setenv MP_SET_NUMTHREADS 10 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 11 microseconds
 The tests below will each take a time on the order
 of 16235 microseconds
    (= 1476 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 770.2861 0.0581 0.0415 0.0744
Scaling : 816.1812 0.0731 0.0392 0.1606
Summing : 777.5164 0.0926 0.0617 0.1482
SAXPYing : 695.2797 0.0938 0.0690 0.1393
 Sum of a is : 2.3066015624966252E+18
 Sum of b is : 4.6132031249912390E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa ( mp=multi--proc. pfa=power fortran accelorator)
12 cpu used ( setenv MP_SET_NUMTHREADS 12 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of -33469 microseconds
    (= -3347 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 981.9564 0.0492 0.0326 0.0845
Scaling : 819.6928 0.0475 0.0390 0.0634
Summing : 1031.7932 0.0762 0.0465 0.1387
SAXPYing : 1154.3195 0.0684 0.0416 0.0811
 Sum of a is : 2.3066015624967501E+18
 Sum of b is : 4.6132031249974874E+17
 Sum of c is : 6.1509375000000013E+17

f77 -O3 -mp -pfa ( mp=multi--proc. pfa=power fortran accelorator)
14 cpu used ( setenv MP_SET_NUMTHREADS 14 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of -325490 microseconds
    (= -32549 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 739.0641 0.0974 0.0433 0.1999
Scaling : 1414.5603 0.0539 0.0226 0.0997
Summing : 1246.1645 0.1358 0.0385 0.2128
SAXPYing : 1501.7872 0.1001 0.0320 0.1484
 Sum of a is : 2.3066015624968750E+18
 Sum of b is : 4.6132031250000006E+17
 Sum of c is : 6.1509375000000000E+17

f77 -O3 -mp -pfa ( mp=multi--proc. pfa=power fortran accelorator)
16 cpu used ( setenv MP_SET_NUMTHREADS 16 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 11 microseconds
 The tests below will each take a time on the order
 of 39583 microseconds
    (= 3598 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 934.1954 0.0837 0.0343 0.1442
Scaling : 1616.0005 0.1117 0.0198 0.2154
Summing : 1179.5075 0.1082 0.0407 0.1997
SAXPYing : 525.1646 0.1457 0.0914 0.2158
 Sum of a is : 2.3066015624969999E+18
 Sum of b is : 4.6132031249999981E+17
 Sum of c is : 6.1509375000000000E+17

----------------------------------------------
----------------------------------------------
results looks strange for 16 cpu , so do it twice, still strange
----------------------------------------------
----------------------------------------------

f77 -O3 -mp -pfa ( mp=multi--proc. pfa=power fortran accelorator)
16 cpu used ( setenv MP_SET_NUMTHREADS 16 )
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10 microseconds
 The tests below will each take a time on the order
 of 410100 microseconds
    (= 41010 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Assignment: 883.8080 0.1039 0.0362 0.2000
Scaling : 949.9786 0.0763 0.0337 0.1300
Summing : 1046.5266 0.0928 0.0459 0.1597
SAXPYing : 1048.3303 0.1207 0.0458 0.1765
 Sum of a is : 2.3066015624969999E+18
 Sum of b is : 4.6132031249999981E+17
 Sum of c is : 6.1509375000000000E+17



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:05 CDT