stream on 320

From: John McCalpin (john@mccalpin.com)
Date: Mon Apr 26 1999 - 08:11:24 CDT


Compiled with -O2 -strength-reduce -funroll-loops

The machine is a 450 MHz uniprocessor Silicon Graphics Model 320....
The performance stinks....

[guest@conn]$ repeat 10 ./stream_wall
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74271 microseconds.
   (= 74271 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.6738 0.0935 0.0932 0.0956
Scale: 180.6012 0.0887 0.0886 0.0889
Add: 227.8294 0.1065 0.1053 0.1155
Triad: 227.9440 0.1054 0.1053 0.1058
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74317 microseconds.
   (= 74317 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.6904 0.0935 0.0932 0.0954
Scale: 180.5216 0.0894 0.0886 0.0954
Add: 227.9873 0.1054 0.1053 0.1060
Triad: 227.9829 0.1054 0.1053 0.1055
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74095 microseconds.
   (= 74095 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7862 0.0934 0.0931 0.0954
Scale: 180.6950 0.0886 0.0885 0.0886
Add: 227.9765 0.1053 0.1053 0.1055
Triad: 228.0154 0.1053 0.1053 0.1054
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74160 microseconds.
   (= 74160 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7807 0.0945 0.0931 0.1056
Scale: 180.6685 0.0886 0.0886 0.0888
Add: 228.0891 0.1053 0.1052 0.1056
Triad: 227.9853 0.1053 0.1053 0.1056
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74065 microseconds.
   (= 74065 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7513 0.0934 0.0932 0.0953
Scale: 180.6175 0.0887 0.0886 0.0888
Add: 228.0870 0.1064 0.1052 0.1153
Triad: 227.9981 0.1054 0.1053 0.1057
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74225 microseconds.
   (= 74225 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7346 0.0945 0.0932 0.1032
Scale: 180.6563 0.0886 0.0886 0.0888
Add: 228.1390 0.1053 0.1052 0.1055
Triad: 228.0653 0.1053 0.1052 0.1055
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74145 microseconds.
   (= 74145 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7548 0.0935 0.0932 0.0955
Scale: 180.7072 0.0886 0.0885 0.0886
Add: 227.8575 0.1064 0.1053 0.1154
Triad: 227.9700 0.1054 0.1053 0.1055
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74180 microseconds.
   (= 74180 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.5966 0.0946 0.0932 0.1033
Scale: 180.5809 0.0887 0.0886 0.0889
Add: 227.9853 0.1053 0.1053 0.1055
Triad: 227.9440 0.1054 0.1053 0.1059
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74172 microseconds.
   (= 74172 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.8048 0.0934 0.0931 0.0953
Scale: 180.7114 0.0886 0.0885 0.0888
Add: 227.8704 0.1072 0.1053 0.1154
Triad: 228.0871 0.1054 0.1052 0.1059
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74191 microseconds.
   (= 74191 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.8194 0.0945 0.0931 0.1031
Scale: 180.7214 0.0886 0.0885 0.0887
Add: 228.1674 0.1052 0.1052 0.1053
Triad: 228.3041 0.1052 0.1051 0.1054
[guest@conn]$ cat >stream_wall.LOG
Compiled with -O2 -strength-reduce -funroll-loops

-rw-r--r-- 1 guest 998 8688 Apr 26 05:51 stream_d.o
-rwxr-xr-x 1 guest 998 9751 Apr 26 05:52 stream_wall
-rw-r--r-- 1 guest 998 381 Apr 26 05:53 times.txt
[guest@conn]$ repeat 10 ./stream_wall
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74271 microseconds.
   (= 74271 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.6738 0.0935 0.0932 0.0956
Scale: 180.6012 0.0887 0.0886 0.0889
Add: 227.8294 0.1065 0.1053 0.1155
Triad: 227.9440 0.1054 0.1053 0.1058
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74317 microseconds.
   (= 74317 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.6904 0.0935 0.0932 0.0954
Scale: 180.5216 0.0894 0.0886 0.0954
Add: 227.9873 0.1054 0.1053 0.1060
Triad: 227.9829 0.1054 0.1053 0.1055
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74095 microseconds.
   (= 74095 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7862 0.0934 0.0931 0.0954
Scale: 180.6950 0.0886 0.0885 0.0886
Add: 227.9765 0.1053 0.1053 0.1055
Triad: 228.0154 0.1053 0.1053 0.1054
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74160 microseconds.
   (= 74160 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7807 0.0945 0.0931 0.1056
Scale: 180.6685 0.0886 0.0886 0.0888
Add: 228.0891 0.1053 0.1052 0.1056
Triad: 227.9853 0.1053 0.1053 0.1056
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74065 microseconds.
   (= 74065 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7513 0.0934 0.0932 0.0953
Scale: 180.6175 0.0887 0.0886 0.0888
Add: 228.0870 0.1064 0.1052 0.1153
Triad: 227.9981 0.1054 0.1053 0.1057
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74225 microseconds.
   (= 74225 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7346 0.0945 0.0932 0.1032
Scale: 180.6563 0.0886 0.0886 0.0888
Add: 228.1390 0.1053 0.1052 0.1055
Triad: 228.0653 0.1053 0.1052 0.1055
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74145 microseconds.
   (= 74145 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.7548 0.0935 0.0932 0.0955
Scale: 180.7072 0.0886 0.0885 0.0886
Add: 227.8575 0.1064 0.1053 0.1154
Triad: 227.9700 0.1054 0.1053 0.1055
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74180 microseconds.
   (= 74180 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.5966 0.0946 0.0932 0.1033
Scale: 180.5809 0.0887 0.0886 0.0889
Add: 227.9853 0.1053 0.1053 0.1055
Triad: 227.9440 0.1054 0.1053 0.1059
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74172 microseconds.
   (= 74172 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.8048 0.0934 0.0931 0.0953
Scale: 180.7114 0.0886 0.0885 0.0888
Add: 227.8704 0.1072 0.1053 0.1154
Triad: 228.0871 0.1054 0.1052 0.1059
-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 1000000, Offset = 0
Total memory required = 22.9 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 74191 microseconds.
   (= 74191 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function Rate (MB/s) RMS time Min time Max time
Copy: 171.8194 0.0945 0.0931 0.1031
Scale: 180.7214 0.0886 0.0885 0.0887
Add: 228.1674 0.1052 0.1052 0.1053
Triad: 228.3041 0.1052 0.1051 0.1054



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:08 CDT