Re: vector code

From: Daan Sandee (sandee@Think.COM)
Date: Tue Nov 05 1991 - 17:24:57 CST

Next message: Donald McLachlan: "RE: memory Bandwidth"
Previous message: Andrew Zachary: "Re: Memory Bandwidth Table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

>L2$_stream_pe_code_3:
> dflodv [aP3+0]2++ aV0
> # C = A + B
> dfaddv [aP2+0]2++ aV0 aV1
> dfstrv aV1 [aP4+0]2++
> jnz aC2 L2$_stream_pe_code_3 ! this is an iteration counter

Okay, I've calmed down, and now I *do* recognize the CM-2.
Every instruction is a macro which expands into actual CMIS code.
So, [above]
load into register file using memory address at P3 strided by 2 (slices) and
using register file address V0;
load (ditto, P2) and add to register contents using RF address V0 > V1 ;
store (ditto, P4) from RF at RF address V1.

> # C = A + 3.0D0*B
> dflodc $3.000000000000000000d+00 aS28
>L2$_stream_pe_code_4:
> dflodv [aP2+0]2++ aV0
> # C = A + 3.0D0*B
> dfmuladdv aS28 [aP3+0]2++ aV1 aV0 aV1
> dfstrv aV1 [aP4+0]2++
> jnz aC2 L2$_stream_pe_code_4

This appears to be the same thing except the Weitek chip does a multiply-
with-constant (in one hardware instruction ; I don't know the op codes.)

So what has this to do with memory bandwidth ? Nothing. In both cases
it loads 3 data words per 6 cycles ; in the second case it does two flops
in stead of one, per 6 cycles (and per output word). I'd have to go to the
chip manual to see if there is anything to be got out of the timing specs.

Next message: Donald McLachlan: "RE: memory Bandwidth"
Previous message: Andrew Zachary: "Re: Memory Bandwidth Table"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:02 CDT