STREAM question (again :)

From: Ian Mapleson (mapleson@gamers.org)
Date: Thu Mar 19 1998 - 07:03:44 CST


Hi there! :)

Yet another question about STREAM...

I asked you in the far gone distance past:

  "Is STREAM CPU-dependent? For example, I'm familiar with the existing
   STREAM result for dual-CPU Octane (515); would/will this number go up
   significantly if one were using two 250MHz R10000s or 300MHz R12000s?"

to which you replied:

   STREAM performance dependends on three factors. The most obvious is
   the peak carrying capacity of the system memory interface. The ability
   of a system to use that bandwidth depends on the relation between the
   main memory latency and the ability of the cpu to tolerate that latency.
   Latency tolerance is usually quantified by:

                              (# outstanding misses) * (cache line size)
          latency tolerance = ------------------------------------------
                                   burst bandwidth on the interface"

>From this I infered that faster CPUs of the same chip design wouldn't
affect the results that much. However, I observed these numbers on
the STREAM results table:

                                   COPY SCALE ADD TRIAD

SGI_Octane_195 1 309.0 309.2 346.5 351.8
SGI_Octane_175 1 277.4 277.9 311.2 316.6

SGI_Indy_R4400_175 1 72.7 68.1 71.6 70.6
SGI_Indy_R4400_100 1 45.7 41.0 44.4 50.0

For these cases, if my inference was accurate, what has caused the
jump in the numbers? I ask because a customer wants me to test a
180MHz R5000SC O2, ie. will I see higher numbers with R5K/200SC.

Btw, I just ran the default STREAM on my office Indy (R4400SC/200MHz,
IRIX 6.2, MIPS Pro 7.1) and obtained:

  Function Rate (MB/s) RMS time Min time Max time
  Copy: 61.5657 0.2614 0.2599 0.2684
  Scale: 60.1434 0.2665 0.2660 0.2680
  Add: 65.4992 0.3671 0.3664 0.3679
  Triad: 74.3992 0.3238 0.3226 0.3258

Is that pretty typical? NB: running the program several times, the
numbers never varied by more than 0.1MB/sec; then I compiled with:

  cc -O -n32 stream_d.c second_wall.c -o stream_d -lm

the results changed somewhat to:

  Function Rate (MB/s) RMS time Min time Max time
  Copy: 69.1109 0.2495 0.2315 0.3199
  Scale: 67.9333 0.2508 0.2355 0.3022
  Add: 72.3061 0.3343 0.3319 0.3466
  Triad: 70.5965 0.3430 0.3400 0.3589

which is rather odd since the compilation with -n32 gave this warning:

  ld32: WARNING 84: /usr/lib32/mips3/libm.so is
                    not used for resolving any symbol.

Naturally, I saw the same effect with -mips3.

I obtained the best results, after some fun experimentation, with:

  cc -O3 -n32 -IPA -LNO:opt=1 -TARG:platform=ip22_4k
    stream_d.c second_wall.c -o stream_d -lm

That gave:

  Function Rate (MB/s) RMS time Min time Max time
  Copy: 75.0595 0.2166 0.2132 0.2325
  Scale: 71.4723 0.2244 0.2239 0.2252
  Add: 74.7831 0.3218 0.3209 0.3257
  Triad: 76.0398 0.3164 0.3156 0.3171

Want to add them to the results table? (full output given below).
Note that I didn't modify the source files before execution - are
they ok as is for a system like mine with 1MB L2? Btw, the numbers
jump by 1% if I use CC instead of cc! :D

It's a pity nobody has ever submitted Indy R5K results. I'd like to
see those.

I'm hoping to run the test on an O2 soon (R5KSC/180). Just need a
SCSI cable now to port over the stuff by DAT (awaiting delivery of a
cable from SGI). I think I mentioned before that SGI US is loaning me
an O2 for a few months, so I'll be able to test an O2 with R5K/200
too. They *may* swap it for an R10K/195 model half way through the
loan if one turns up, in which case I'll test that too.

Now to test that Indigo2 I bought. I think I need to change the
source file somewhere because the I2 has 2MB L2, yes? If so, what
should I change?

Cheers! :)

Ian.

SGI Network Admin, University of Central Lancashire, Preston, England, PR1 2HE.
mapleson@gamers.org | Tel: (+44 -0) 1772 893297, Fax: (+44 -0) 1772 892913
"There is no magic, only stuff." - Nakor, "The King's Buccaneer" (R.E. Feist)

Doom Help Service (DHS): http://doomgate.gamers.org/dhs/
SGI/N64/Future Technology: http://www.geocities.com/ResearchTriangle/2321/
BSc Dissertation: http://doomgate.gamers.org/dhs/diss/
CyberSurvival: http://www.007eleven.com/

********************************************************************************

MILAMBER 31# cc -O3 -n32 -IPA -LNO:opt=1 -TARG:platform=ip22_4k stream_d.c second_wall.c -o stream_d -lm
stream_d.c:
second_wall.c:
ld32: WARNING 84: /usr/lib32/mips3/libm.so is not used for resolving any symbol.
MILAMBER 30# ./stream_d
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 25 microseconds.
Each test below will take on the order of 141404 microseconds.
   (= 5656 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 75.0595 0.2166 0.2132 0.2325
Scale: 71.4723 0.2244 0.2239 0.2252
Add: 74.7831 0.3218 0.3209 0.3257
Triad: 76.0398 0.3164 0.3156 0.3171



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:07 CDT