benchmark question

From: Bob Greiner (bobg@phx.mcd.mot.com)
Date: Fri Jan 17 1992 - 11:42:04 CST


mccalpin@perelandra.cms.udel.edu (John D. McCalpin) -

I have a question on your bandwidth-limited Mflop benchmarks.

Your benchmark measures bandwidth to global memory, without allowing
cache-based processors to get benefits from reusing data in local
memory. I understand that this is an accurate reflection of many large
dense matrix applications.

Yet you apparently allow the CM-2 to access local memory.

Why not measure the saxpy in the a(i) = a(i) + scalar*b(i) format?
This configuration of saxpy is frequently used. This would remove the
penalty on cached machines' pre-read on a(i). Alternately, why not
insist that at least one of a, b, or c be on a different processor in
the CM-2? This would seem to be true when using saxpy for matrix
multiply or Gaussian elimination.

Thank you for your attention and response.

- Bob Greiner, Motorola Computer Group, bobg@phx.mcd.mot.com
  Personal opinion only



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT