I noticed you don't seem to have received any results for any current HP
workstations. I just got in a few new HP B2000s today, which have the
PA-8500 CPU. I was curious if it improved raw memory bandwidth so I
gave STREAM a shot on it versus the C3000. The B2000 has
a 400MHz PA-8500, the C3000 a 400MHz PA-8500, and the underlying
architecture is pretty much identical. I got essentially identical
results on each -- the performance seems to scale linearly on these new
systems based on CPU speed, based on past tests with systems using
300MHz and 440Mhz PA-8500s. I'll have to see if that remains true with
the higher end 552MHz PA-8600 models when they arrive.

These are C results, on a B2000 system with 768MB RAM, using:

cc -o stream second_cpu.c stream_d.c +O4 +Oall +Odataprefetch -Wl,-a,archive

This system uses 8 bytes per DOUBLE PRECISION word.
Array size = 26000000, Offset = 0
Total memory required = 595.1 MB.
Each test is run 10 times, but only
the *best* time for each is used.
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 319999 microseconds.
   (= 32 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
Function Rate (MB/s) RMS time Min time Max time
Copy: 866.6667 0.4890 0.4800 0.4900
Scale: 885.1064 0.4770 0.4700 0.4800
Add: 975.0000 0.6490 0.6400 0.6500
Triad: 960.0000 0.6530 0.6500 0.6600

