benchmark question

From: Bob Greiner (
Date: Fri Jan 17 1992 - 11:42:04 CST (John D. McCalpin) -

I have a question on your bandwidth-limited Mflop benchmarks.

Your benchmark measures bandwidth to global memory, without allowing
cache-based processors to get benefits from reusing data in local
memory. I understand that this is an accurate reflection of many large
dense matrix applications.

Yet you apparently allow the CM-2 to access local memory.

Why not measure the saxpy in the a(i) = a(i) + scalar*b(i) format?
This configuration of saxpy is frequently used. This would remove the
penalty on cached machines' pre-read on a(i). Alternately, why not
insist that at least one of a, b, or c be on a different processor in
the CM-2? This would seem to be true when using saxpy for matrix
multiply or Gaussian elimination.

Thank you for your attention and response.

- Bob Greiner, Motorola Computer Group,
  Personal opinion only

This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT