STREAM results

From: jerry (j.) quinn (jquinn@nortel.ca)
Date: Mon Oct 28 1996 - 14:54:00 CST


Performance results for our HP's which you didn't seem to have. Both sets of
timing results provided with stream_cpu.c

Jerry Quinn
jquinn@nortel.ca

HP9000/712/80 HPUX 9.05
cc +O4
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 500000 microseconds.
   (= 50 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 66.6667 0.4970 0.4800 0.5900
Scale: 66.6667 0.4912 0.4800 0.5300
Add: 75.0000 0.6511 0.6400 0.6800
Triad: 75.0000 0.6500 0.6400 0.6600

=================================================================
I did 3 successive runs on the same machine
HP9000/735/125 HPUX
cc +O4
Highly variable results. Why?

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 290000 microseconds.
   (= 29 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 68.0851 0.4790 0.4700 0.4900
Scale: 74.4186 0.4763 0.4300 0.4900
Add: 73.8462 0.6570 0.6500 0.6700
Triad: 72.7273 0.6640 0.6600 0.6800
bmtlh30:~/mac> stream_d
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 240000 microseconds.
   (= 24 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 91.4286 0.4385 0.3500 0.4800
Scale: 80.0000 0.4630 0.4000 0.5000
Add: 87.2727 0.6349 0.5500 0.6800
Triad: 97.9592 0.6300 0.4900 0.6800
bmtlh30:~/mac> stream_d
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2000000, Offset = 0
Total memory required = 45.8 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 290000 microseconds.
   (= 29 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 80.0000 0.4728 0.4000 0.5100
Scale: 69.5652 0.4791 0.4600 0.4900
Add: 75.0000 0.6570 0.6400 0.6700
Triad: 80.0000 0.6543 0.6000 0.6700

======================================================================
2 more runs on the same machine
HP9000/735/125 HPUX
cc +O4
This time the results were stable.

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 5000000, Offset = 0
Total memory required = 114.4 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 739999 microseconds.
   (= 74 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 67.2269 1.2021 1.1900 1.2400
Scale: 67.2269 1.2041 1.1900 1.2300
Add: 73.1707 1.6510 1.6400 1.6600
Triad: 73.1707 1.6560 1.6400 1.6700
bmtlh30:~/mac> stream_d
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 5000000, Offset = 0
Total memory required = 114.4 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 9999 microseconds.
Each test below will take on the order of 740000 microseconds.
   (= 74 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 67.7966 1.1920 1.1800 1.2100
Scale: 68.9655 1.1911 1.1600 1.2100
Add: 75.0000 1.6431 1.6000 1.6700
Triad: 73.6196 1.6460 1.6300 1.6700



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:06 CDT