From: M.J. Rutter (
Date: Thu Mar 12 1998 - 09:31:42 CST


     Many thanks for releasing "stream" to the world - it has done much to
make vendors aware of a very important issue in large-scale scientific
computing. I enclose some results from a slightly unusual system, a
Hitachi SR2201 (single node). It is a RISC based MPP with some
pseudovectorisation features which enhance the memory bandwidth and hide
latency. If you want any more details about the system, do ask, or see

     Michael Rutter

 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 Your clock granularity/precision appears to be 2 microseconds
 The tests below will each take a time on the order
 of 40948 microseconds
    (= 20474 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
Function Rate (MB/s) RMS time Min time Max time
Copy: 775.6824 0.0413 0.0413 0.0413
Scale: 736.0721 0.0435 0.0435 0.0436
Add: 829.4023 0.0580 0.0579 0.0584
Triad: 826.9161 0.0581 0.0580 0.0581
 Sum of a is = 0.230660156249616742E+019
 Sum of b is = 461320312502628096.
 Sum of c is = 615093750008502400.

[Compiled with: xf90 -W0,'OPT(O(SS))' stream_d.f second_cpu.f
 second_cpu.f was completely rewritten, stream_d.f was untouched,
 revision 4.1, June 4, 1996]

