Date: Mon Apr 07 1997 - 08:46:45 CDT

You are incorrect on a couple of points. First of all, the Power2
architecture is binary compatible with the PowerPC architecture. Many
users of the Power2 will toggle on the compiler option to generate some
Power2 specific instructions but this is true for many families of
micro-processors. The bottom line is that a standard compile will
generate an executable that will run on both PowerPC and Power2. Power2
and PowerPC have the same instruction set and run the same compilers,
OS, applications, etc.

Secondly, IBM is continuing to produce new versions of the Power2
architecture so your statement about "not being able to design a machine
that came even close" is incorrect. The older 590 was based on a 67 MHz
clock while the latest Power2 designs ( P2SC ) have a 120 and 135 MHz
clock. These new processors were just introduced in the fall and have
been shipping in quantity for several months. This is a "brainiac"
design. Regarding price, they are comparable to the high-end DEC, SGI,
and HP workstations ( in other words, if your budget is $3000, then you
should buy a Pentium Pro or PowerPC 604e ). When comparing workstations
for high-end computing, you should reference the Power2 RS6000s, not the
PowerPC RS6000s.

The stream numbers from the RS6000 595 ( 135 MHz ) are as follows:

Function Rate (MB/s) RMS time Min time Max time
Assignment: 620.9357 .0490 .0515 .0518
Scaling : 650.4704 .0468 .0492 .0495
Summing : 660.6959 .0690 .0727 .0729
SAXPYing : 689.5455 .0661 .0696 .0698

Please note that the 595 has the same memory bus as the 590 but the
processor has 2x the clock speed. Stream is a measure of memory bus
performance. The 590 and 595 have a 67 MHZ memory bus that is 256 bits
wide. The processor supports 2 "quad-word" load instructions per clock
period. The 591 has a faster memory bus ( 77 MHz ) and is therefor
faster on Stream. Typical real application performance improvements for
the 595 over the 590 is a factor of 1.7.

The Power2 RS6000s are often bought by those who are doing floating
point intensive work that requires high memory bandwidth. It has 2-3x
the single stride memory bandwidth of any other micro-processor. This
difference shows up on problems that generate significant memory
traffic. I think Stream actually correlates well with certain fluid
problems that produce relatively few operations per memory reference and
have a working set that is much larger than L2 cache ( for those systems
with L2 cache ).

  Jim Tuccillo


