Stream for SR2201

From: M.J. Rutter (mjr19@cus.cam.ac.uk)
Date: Thu Mar 12 1998 - 09:31:42 CST


John,

     Many thanks for releasing "stream" to the world - it has done much to
make vendors aware of a very important issue in large-scale scientific
computing. I enclose some results from a slightly unusual system, a
Hitachi SR2201 (single node). It is a RISC based MPP with some
pseudovectorisation features which enhance the memory bandwidth and hide
latency. If you want any more details about the system, do ask, or see
http://www.hpcf.cam.ac.uk/tech.html.

     Michael Rutter

----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 Array size = 2000000
 Offset = 0
 The total memory requirement is 45 MB
 You are running each test 10 times
 The *best* time for each test is used
 ----------------------------------------------------
 Your clock granularity/precision appears to be 2 microseconds
 The tests below will each take a time on the order
 of 40948 microseconds
    (= 20474 clock ticks)
 Increase the size of the arrays if this shows that
 you are not getting at least 20 clock ticks per test.
 ----------------------------------------------------
 WARNING -- The above is only a rough guideline.
 For best results, please be sure you know the
 precision of your system timer.
 ----------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 775.6824 0.0413 0.0413 0.0413
Scale: 736.0721 0.0435 0.0435 0.0436
Add: 829.4023 0.0580 0.0579 0.0584
Triad: 826.9161 0.0581 0.0580 0.0581
 Sum of a is = 0.230660156249616742E+019
 Sum of b is = 461320312502628096.
 Sum of c is = 615093750008502400.

[Compiled with: xf90 -W0,'OPT(O(SS))' stream_d.f second_cpu.f
 second_cpu.f was completely rewritten, stream_d.f was untouched,
 revision 4.1, June 4, 1996]



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:07 CDT