Re: PA-8500 SPEC results mystifying

From: dsiebert@engineering.uiowa.edu
Date: Fri Jan 15 1999 - 10:12:08 CST


Hi John,

Since this PA-8500 / C360 thing has me interested, I did a few STREAM
runs on some boxes I have here. Unfortunately, I do not have the f77
compilers installed -- only the f90 which are still not up to par with
the f77 in optimization capability. With my minimal knowledge of
Fortran I got it running but it was actually slower in the Fortran
version, which probably shouldn't happen. Anyway, I am just submitting
C results. Compiler is about six months old, but that is probably not
an issue for this sort of thing. I compiled on the C360 and used the
same binary on a C180, C200, and C240 as well. Systems were lightly
used (users logged into the console doing not much of anything other
than Netscape -- only a couple % CPU usage)

The C360, BTW, was a C200 upgraded to C360. I don't know if the C360s
you buy as C360s are different at all (probably not, HP is supposed to
be coming out with some new stuff soon)

/usr/bin/cc:
              LINT A.10.32.16 CXREF A.10.32.16
        HP92453-01 A.10.32.18 HP C Compiler
         /usr/lib/libc: $Revision: 76.3 $

/opt/ansic/lbin/ccom:
              LINT A.10.32.16 CXREF A.10.32.16
        HP92453-01 A.10.32.18 HP C Compiler
         HP-UX SLLIC/OPTIMIZER UX.10.20.559 (DAVIS): 05/26/98
         Ucode Code Generator - UX10.20.61 (PACG_UX10.MULTI_BL33)
         REV: HP SESD Code
         High Level Optimizer - UX.10.21.971124 (UX10.MULTI)
[-DHLO_RELEASE +O3] - 05-Mar-98.13:41
         /usr/lib/libc: $Revision: 76.3 $

Compiler options (just guessing at the ones to use) +Oall -Wl,-a,archive

Its interesting that the C180 results are really not much different
than the C360 results. HP has a lot of room for improvement in this
area -- the PA-8500 would look a whole lot better if they put a good
memory system in the box. As it is, they are working with something
that is about on par with the best PC motherboards, which isn't much
to brag about in the workstation world, especially given the DS20
results and the likely better results for the POWER3 machines.

C180 (PA-8000@180MHz 512K + 512K offchip L1, IIRC)
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 7000000, Offset = 0
Total memory required = 160.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 260000 microseconds.
   (= 26 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 302.7027 0.3740 0.3700 0.3800
Scale: 266.6667 0.4260 0.4200 0.4300
Add: 190.9091 0.8850 0.8800 0.9000
Triad: 204.8780 0.8290 0.8200 0.8400

C200 (PA-8200@200MHz 512Ki + 1MBd offchip L1)
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 7000000, Offset = 0
Total memory required = 160.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 270000 microseconds.
   (= 27 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 287.1795 0.3910 0.3900 0.4000
Scale: 273.1707 0.4190 0.4100 0.4200
Add: 202.4096 0.8310 0.8300 0.8400
Triad: 224.0000 0.7540 0.7500 0.7600

C240 (PA-8200@236MHz 2MB + 2MB off chip L1)
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 7000000, Offset = 0
Total memory required = 160.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 240000 microseconds.
   (= 24 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 339.3939 0.3320 0.3300 0.3400
Scale: 320.0000 0.3530 0.3500 0.3600
Add: 215.3846 0.7850 0.7800 0.7900
Triad: 218.1818 0.7700 0.7700 0.7700

C360 (PA-8500@367MHz 512Ki + 1MBd on chip L1)
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 7000000, Offset = 0
Total memory required = 160.2 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 209999 microseconds.
   (= 21 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 311.1111 0.3670 0.3600 0.3700
Scale: 273.1707 0.4110 0.4100 0.4200
Add: 210.0000 0.8070 0.8000 0.8100
Triad: 207.4074 0.8100 0.8100 0.8100

--
Douglas Siebert                Director of Computing Facilities
douglas-siebert@uiowa.edu      Division of Mathematical Sciences, U of
Iowa

I plan to live forever, or die trying.



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT