Some stream numbers

From: Ralf Loesche (loesche@math.tu-dresden.de)
Date: Fri Apr 18 1997 - 05:34:26 CDT


Hi,

I took my chances and tried your stream code on my PC: Intel P133, 512 KB
PB cache, 64 MB EDO RAM, running Linux RedHat 4.0, kernel 2.0.18.
Compilers used: GCC 2.7.2.f.1, G77 0.5.18

Here the compiler command line and the results for both the C and the
F77 version of stream (best results out of 10 runs) come:

C:
gcc -O2 -fomit-frame-pointer -finline-functions -funroll-all-loops -fstrength-reduce -fexpensive-optimizations -malign-jumps=2 -malign-loops=2 -malign-functions=2 -o stream.LINUX.GCC.full_opt second_cpu.c stream_d.c -lm

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 220000 microseconds.
   (= 22 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 88.8889 0.3610 0.3600 0.3700
Scale: 118.5185 0.2770 0.2700 0.2800
Add: 126.3158 0.3810 0.3800 0.3900
Triad: 117.0732 0.4170 0.4100 0.4200

F77:
g77 -O1 -malign-loops=2 -malign-jumps=2 -malign-functions=2 -finline-functions -fstrength-reduce -funroll-all-loops -o stream.LINUX.g77.opt etime.o second_cpu.f stream_d.f

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 10000 microseconds
 The tests below will each take a time on the order
 of 230000 microseconds
    (= 23 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 86.4865 0.3606 0.3700 0.4000
Scale: 118.5187 0.2625 0.2700 0.2900
Add: 123.0771 0.3732 0.3900 0.4000
Triad: 117.0732 0.3921 0.4100 0.4200
 Sum of a is = 2.30660156E+18
 Sum of b is = 4.61320312E+17
 Sum of c is = 6.1509375E+17

Beside of Copy these numbers are a bit better than what's on
your results table.

Ciao, Ralf Loesche



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:06 CDT