Re: stream benchmark

From: Robert Bell (
Date: Tue Feb 21 1995 - 16:34:47 CST

On Tue, 21 Feb 1995, John D. McCalpin wrote:

> Date: Tue, 21 Feb 95 09:20:30 -0500
> From: John D. McCalpin <>
> To: Robert Bell <>
> Subject: Re: stream benchmark

> I think that you are using the old version of the stream code.
> I made a bunch of changes quite similar to what you suggest last
> year sometime, and the current version of the code does not
> optimize away on any machine that I know of.
> I should have made an announcement about this change, but
> I have not been very good bout keeping a revision history on the code.
> Thanks for the interest -- I will have a look at what you have
> done and see if it makes more sense than what I did....
> >C
> >C With these original assignments, all values in any array were
> >C the same, and no array operations are needed.
> >C a(j) = 1.0
> >C b(j) = 2.0
> >C c(j) = 0.0
> >C
> >C With the following, optimisation is more difficult.
> > a(j) = real (j - N/2) / real (N)
> The original assignments were there just to make sure that all
> of the pages had been touched. I was not particularly worried about
> how long it would take, but I needed to make sure that all of the
> entries had valid FP numbers in them and that all of the pages had
> been accessed at least once. Lots of machines are slower on the
> first access because of virtual memory or TLB misses.
> I also updated the version number and modification date of the
> stream_d.f program to match that of the stream_wall program, since
> they both contain the revised, optimizer-defeating rearrangement
> of the algorithm.
> john
        Yes I did have an old version. I ran your current version
of stream_d.f under f90 on our Cray, and the results are good.
        The old version I had did not use the results of the fourth
loop in the first loop.
        Sorry for wasting your time.

        There is still a problem - because all the array elements are
the same, you can easily calculate that each time through the
DO 60 k = 1,ntimes loop, the values of a(j) are just multiplied by 15.
So the answer to the sum of the a(j) is 15**ntimes * N = 1.1533...E+18
A really smart compiler could work out that all the array elements are
equal, and skip the loops entirely. If the array elements are
unequal, the job is much harder for the compiler.
        Something like
            c(j) = j
            b(j) = c(j) - N/2
            a(j) = b(j) / real (N)
would be good for the initialisation - it dirties all the pages, and
provides unequal elements for the arrays.
With the main loop structure depending only on the initial values of a(j),
a good compiler could optimise away the assignments to c(j) and b(j),
leaving the values of a(j) sitting in the cache, but optimisation
doesn't matter on this loop.

Rob. Bell ( email: )

/ |  CSIRO Supercomputing Facility Manager  \
| CSIRO Division of Information Technology, Supercomputing Support Group |
| 723 Swanston Street | tel: {+61 3,03,} 282 2620 or {+61,0} 18 108 333  |
\ Carlton VIC 3053 Australia   |  fax: {+61 3,03,} 282 2600              /
From 1995 May 8: | tel: {+61 3,03,} 9282 2620 | fax: {+61 3,03,} 9282 2600/

This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:04 CDT