After reading Paul Hsieh's note (Dec 17, 1999), I decided to compile
stream_d.c with Intel's C/C++ 4.5. I use Intel's compiler for my codes, and
find v4.5 offers real improvements for memory-intensive apps, with
consistent performance for runs that last days. I used Paul Hsieh's
??-p55clock.c, since the clock() function in the win32*.c timer has very
In short, there was an overall, significant improvement for everything but
triad, relative to the Lahey F90.
In case it wasn't obvious: I didn't tweak the source code in any way; it
was the default stream_d.c.
copy, scale, add, triad
Avg_1st_7: 533.08, 502.9, 641.2, 582.0
Avg_all10: 531.2, 501.4, 636.5, 577.9
System: Athlon 800 in Asus K7V motherboard, 768 MB PC133 SDRAM with 4:3
RAM:FSB and CAS..RAS 3-2-2.
WDC 7200 RPM 20 GB EIDE drive.
OS: win98; I did 10 runs, since I've noticed significant performance
variations. I killed all but the basic background processes; however, I
think the very act of wiring the redirecting results to file, and
concatenating the files, may be responsible for the little "dip" sjown in
the pdf attached.
(1) stream.exe, win32 executable
(2) Athlon800_stream_c.txt, raw batch file output of ten runs
(3) Athlon800_stream_c.xls, Excel binary with results summary and plot
(4) plot_athlon800_stream_c.pdf, plot extracted from Excel file
Compilation switches were:
/G6 /ML /W2 /GX /O2 /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /FA /Fa"Release/"
/Fp".\Release\stream.pch" /YX /Fo".\Release/" /Fd".\Release/" /FD -Qrestrict
...most of those are just directory junk; NOTE there were no restrict
keywords in the source, so that switch should have no effect.
This archive was generated by hypermail 2b29 : Sun Jun 11 2000 - 06:23:15 CDT