STREAM results for Apple G4/400 system ii (w/AltiVec)

From: Anton Rang (
Date: Wed Mar 22 2000 - 15:01:04 CST


Following up to my previous's some results for an Apple
Power Mac G4/400 using the AltiVec stream instructions to prefetch data.

        MetroWerks CodeWarrior Pro 5.3, MW C/C++ PPC 4.0

Compiler flags:
        Unsigned chars, auto-inline

        PowerPC struct alignment, Strings Read-Only, Static data in TOC,
        use FMADD & FMSUB, No Traceback Tables,
        schedule for AltiVec, optimize for speed, peephole optimizer on,
        global optimization level 4
        Dead strip static initialization code
        stack size 64k, heap size 11000k

If you're interested in the source mods, let me know. I may play with them
some more yet as I learn how to use Apple's instrumentation tools. I'm
curious as to why the Add test didn't benefit nearly as much from the
streaming instructions as the others did.


-- Anton

This system uses 8 bytes per DOUBLE PRECISION word.
Array size = 400000, Offset = 0
Total memory required = 9.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
Your clock granularity/precision appears to be 10 microseconds.
Each test below will take on the order of 21671 microseconds.
   (= 2167 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
Function Rate (MB/s) RMS time Min time Max time
Copy: 477.0779 0.0136 0.0134 0.0138
Scale: 465.2854 0.0139 0.0138 0.0143
Add: 445.5377 0.0217 0.0215 0.0219
Triad: 442.0500 0.0220 0.0217 0.0226

This is an unauthorized communication.  "The statements and opinions
expressed herein are my own and do not necessarily represent those of

This archive was generated by hypermail 2b29 : Tue Apr 25 2000 - 01:49:24 CDT