a couple orders of magnitude apart from the p690
results you recently got but here's my set of results
from a Rev. B Powerbook G4 with the 7440 running at
667 MHz and the 133 MHz system bus. System has
512meg and is running OS X 10.1.1.

I used the precompiled binaries from the contrib
directory and see the same results as other PPC
reporters: The 601 optimized code is faster by a factor
of 2 on the G4 than all others that come out of the
package (604, base, unrolled).

The precompiled binary will launch in the Classic (OS 9)
environment. I'm not sure about the impact of this on
the runtime and would expect little since most of what is
going on inside STREAM is independent of OS. I'll try to
compile my own on of these days - more as a test of
current Apple compiler technology - and see if there is
a difference.

This system uses 8 bytes per DOUBLE PRECISION word.
Array size = 400000, Offset = 0
Total memory required = 9.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
Your clock granularity/precision appears to be 24 microseconds.
Each test below will take on the order of 24513 microseconds.
   (= 1021 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
Function Rate (MB/s) RMS time Min time Max time
Copy: 515.4224 0.0172 0.0124 0.0358
Scale: 456.6210 0.0162 0.0140 0.0195
Add: 447.5524 0.0264 0.0214 0.0405
Triad: 425.0985 0.0241 0.0226 0.0267

Axel Spohr
