(no subject)

From: honhuang@einstein.phys.uwm.edu
Date: Sun Apr 21 1996 - 13:27:48 CDT

   This is a data point for IBM rs6000--591. aix, xlf 3,
xlf -O3 -qarch=pwr2 -qtune=pwr2.
   I only got 16 clock ticks with memory usage 228 MB. Although
the machine has 512 MB memory, whenever the memory exceed 256 MB,
the program will not run, and I got the message saying that not
enough memory. I cheched limit, it was set properly. Until I
figure out how to use the 512 MB in full, 16 clock ticks is the
best I could get. ( or maybe 17 ).
   Why do you use the best time ( shortest time ) ? Why not use
average time with more iterations ? using larger array means
average over space, using more iteration means average over time,
are they equivalent ? Average time is closer to the way machine
is used than best time, I think.
   The system load is neally 0 excluding stream itself.
   Best reguards.

 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
 Array size = 10000000
 Offset = 0
 The total memory requirement is 228 MB
 You are running each test 10 times
 The *best* time for each test is used
 Your clock granularity/precision appears to be 10000 microseconds
 The tests below will each take a time on the order
 of 160000 microseconds
    (= 16 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 WARNING: The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
Function Rate (MB/s) RMS time Min time Max time
Assignment: 695.6522 .2434 .2300 .2500
Scaling : 695.6522 .2434 .2300 .2500
Summing : 750.0000 .3345 .3200 .3400
SAXPYing : 750.0000 .3334 .3200 .3400
 Sum of a is : 0.115330078110398751E+20
 Sum of b is : 0.230660156211832781E+19
 Sum of c is : 0.307546874927796787E+19

This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:05 CDT