Meiko i860 machine memory bandwidth test

From: Boris Cownie (uunet.UU.NET!marge!boris)
Date: Fri Feb 07 1992 - 09:49:28 CST


John,

We make a machine with multiple intel i860 nodes each with upto 32 Mbytes
of memory. These talk to each other via a reconfigurable network of tranputers.
2 transputers (8 links) have shared memory with each i860 and support a through
routing message passing system that does not impact the i860 memory bandwidth.
I think we probably have one of the fastests i860 memory systems there is.
I'm not sure that your test program is not also a measure of how smart the
compiler is at optimising the vector loops. The RS6000 has some smart stuff
in there and might well just wack in a single instruction for the copy. If the
sparc compiles this as a loop it not really a fair test of the memory system.
Anyway the results I get do correspond to the real world times on other
benchmarks. If the road is down hill and the wind is behind you each i860
runs at about the speed of the R6000/530. But looks at the super times we get
with the VAST vectoriser !

 Meiko MK096 dual i860 board.

 One i860 node (16 Mbytes of memory)

 Portland Group Compiler

 pgf77 (no optimisation)

 Timing calibration ; t = 38.28125 clicks
     
 Assignment: Rate = 41.79592 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 29.68116 MB/s MFLOPS = 1.855072
 Summing: Rate = 47.62790 MB/s MFLOPS = 1.984496
 SAXPYing: Rate = 28.71028 MB/s MFLOPS = 2.392524
 
 pgf77 -O4
 
 Timing calibration ; t = 11.32813 clicks
     
 Assignment: Rate = 141.2414 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 51.84810 MB/s MFLOPS = 3.240506
 Summing: Rate = 120.4706 MB/s MFLOPS = 5.019608
 SAXPYing: Rate = 39.13375 MB/s MFLOPS = 3.261146
 
 pgf77 -O4 -Mvect
 
 Timing calibration ; t = 11.32813 clicks
     
 Assignment: Rate = 141.2414 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 48.18823 MB/s MFLOPS = 3.011765
 Summing: Rate = 149.8537 MB/s MFLOPS = 6.243903
 SAXPYing: Rate = 37.69325 MB/s MFLOPS = 3.141104

 Greenhills compiler plus Pacific Sierra VAST vectoriser
 
 f77apx -vast -OLMA (vectorise, optimise, unroll loops, no arithmetic checks)

 Timing calibration ; t = 5.024004 clicks
     
 Assignment: Rate = 318.4711 MB/s MFLOPS = 0.0000000E+00
 Scaling: Rate = 99.52228 MB/s MFLOPS = 6.220142
 Summing: Rate = 238.8533 MB/s MFLOPS = 9.952222
 SAXPYing: Rate = 103.8493 MB/s MFLOPS = 8.654112

 wow this one makes it really screams !

 Cheers Boris



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT