I also did a DCOPY using cached rather than pipelined loads (assembly).
I got about copied about 4.5 M double precision words, or
2 * 4.5 * 8 = 72 Mbytes per second. I suspect that the compiled code
would get less, but I'm not how much less it will get.
I think compiled code gets worse for more complex loops.
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:01 CDT