G4 Stream results (partially experimental)

From: Craig Armour (Craig.Armour@ausbit.com.au)
Date: Tue Sep 17 2002 - 22:50:04 CDT

  • Next message: Wei Lin and Norbert Juffa: "STREAM results for Pentium4 @ 2.8 GHz with PC1066 RDRAM"

    Hi,

    I have some results for stream on the G4 450mhz Dual proc system (
    standard apple ) uname of my box is as follows

    Linux formaldehyde 2.4.19 #11 SMP Sun Sep 8 09:14:19 EST 2002 ppc unknown

    stats:
    2xG4@450mhz
    128mb ram

    The results I have are better than the ones currently listed, but not
    necssarily mind blowing. I've also attached the source to the pthread
    code used to obtain the mp information.

    My code isn't the best but not sure for the crappy performance compared
    to the other stats on your page. Perhaps the unified l2 cache doesn't
    go well with the multi threading and you get a bit more cache thrashing
    than prefered *shrug*. Also, gcc probably isn't the best compiler but
    I'd be interested in comparing the code the other guys got from their
    compilers to the code gcc produces

    Cheers
    Craig



    craig@formaldehyde:~/src$ ./stream
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 400000, Offset = 0
    Total memory required = 9.2 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 11773 microseconds.
       (= 11773 clock ticks)
       Increase the size of the arrays if this shows that
       you are not getting at least 20 clock ticks per test.
       -------------------------------------------------------------
       WARNING -- The above is only a rough guideline.
       For best results, please be sure you know the
       precision of your system timer.
       -------------------------------------------------------------
       Function Rate (MB/s) RMS time Min time Max time
       Copy: 298.4938 0.0217 0.0214 0.0226
       Scale: 297.9653 0.0218 0.0215 0.0219
       Add: 321.1885 0.0299 0.0299 0.0301
       Triad: 327.1542 0.0297 0.0293 0.0298

    craig@formaldehyde:~/src$ /export/local/bin/gcc -O3 -funroll-loops -fprefetch-loop-arrays -mcpu=604 -lm -o stream second_wall.c stream_d.c

    craig@formaldehyde:~/src$ /export/local/bin/gcc -O3 -funroll-loops -fprefetch-loop-arrays -mcpu=604 -pthread -lm -o stream_mp second_wall.c stream.c

    craig@formaldehyde:~/src$ /usr/bin/time ./stream_mp
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 400000, Offset = 0
    Total memory required = 9.2 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 2 microseconds.
    Each test below will take on the order of 11355 microseconds.
       (= 5677 clock ticks)
       Increase the size of the arrays if this shows that
       you are not getting at least 20 clock ticks per test.
       -------------------------------------------------------------
       WARNING -- The above is only a rough guideline.
       For best results, please be sure you know the
       precision of your system timer.
       -------------------------------------------------------------
       Function Rate (MB/s) RMS time Min time Max time
       Copy: 338.8759 0.0194 0.0189 0.0212
       Scale: 325.3191 0.0201 0.0197 0.0212
       Add: 341.7224 0.0284 0.0281 0.0286
       Triad: 340.4137 0.0283 0.0282 0.0285
       
       1.89user 0.09system 0:01.05elapsed 188%CPU (0avgtext+0avgdata 0maxresident)k
       0inputs+0outputs (179major+2438minor)pagefaults 0swaps
       



    This archive was generated by hypermail 2.1.4 : Fri Nov 08 2002 - 13:37:15 CST