Starfire Stream results

From: Alan Charlesworth (alanc@West.Sun.COM)
Date: Thu Mar 13 1997 - 22:52:06 CST


Sorry for the delay. I was supposed to wait until after this week's Sun HPC
announcement

These are Standard C results. I added some data to the printout, but I
believe all the stock info is there

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 2097152, Offset = 0
Procs = 1. Per proc = 2048 K
Total memory required = 48.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 184877 microseconds.
   (= 184877 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
1 procs 2048 K elements per proc 10 loop repeats 48 MB memory
8 byte alignment, 0 byte offset
Auto-parallel C
         Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
        time time time Min Stream Total Stream Total Load Store Mem
         sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.204 0.205 0.205 0% 164 246 164 246 390 780 260
Scale: 0.204 0.205 0.205 0% 164 246 164 246 390 779 260
Vadd : 0.249 0.250 0.250 0% 202 269 202 269 317 951 238
Triad: 0.249 0.250 0.250 0% 202 269 202 269 317 951 238

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 16777216, Offset = 0
Procs = 8. Per proc = 2048 K
Total memory required = 384.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 188209 microseconds.
   (= 188209 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
8 procs 2048 K elements per proc 10 loop repeats 384 MB memory
8 byte alignment, 0 byte offset
Auto-parallel C
         Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
        time time time Min Stream Total Stream Total Load Store Mem
         sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.211 0.212 0.213 1% 1271 1907 159 238 403 806 269
Scale: 0.211 0.212 0.212 1% 1270 1905 159 238 403 806 269
Vadd : 0.261 0.261 0.262 0% 1544 2059 193 257 332 995 249
Triad: 0.260 0.261 0.261 0% 1546 2062 193 258 331 993 248

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 33554432, Offset = 0
Procs = 16. Per proc = 2048 K
Total memory required = 768.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 195521 microseconds.
   (= 195521 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
16 procs 2048 K elements per proc 10 loop repeats 768 MB memory
8 byte alignment, 0 byte offset
Auto-parallel C
         Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
        time time time Min Stream Total Stream Total Load Store Mem
         sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.226 0.227 0.227 0% 2371 3557 148 222 432 864 288
Scale: 0.222 0.223 0.223 0% 2414 3620 151 226 424 849 283
Vadd : 0.274 0.274 0.275 0% 2942 3922 184 245 348 1044 261
Triad: 0.277 0.278 0.278 0% 2905 3873 182 242 353 1058 264

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 50331648, Offset = 0
Procs = 24. Per proc = 2048 K
Total memory required = 1152.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 200024 microseconds.
   (= 200024 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
24 procs 2048 K elements per proc 10 loop repeats 1152 MB memory
8 byte alignment, 0 byte offset
Auto-parallel C
         Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
        time time time Min Stream Total Stream Total Load Store Mem
         sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.226 0.226 0.228 1% 3568 5353 149 223 430 861 287
Scale: 0.225 0.226 0.228 1% 3577 5366 149 224 429 859 286
Vadd : 0.281 0.282 0.283 1% 4292 5722 179 238 358 1074 268
Triad: 0.281 0.281 0.282 0% 4305 5740 179 239 357 1070 268

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 67108864, Offset = 0
Procs = 32. Per proc = 2048 K
Total memory required = 1536.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 209213 microseconds.
   (= 209213 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
32 procs 2048 K elements per proc 10 loop repeats 1536 MB memory
8 byte alignment, 0 byte offset
Auto-parallel C
         Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
        time time time Min Stream Total Stream Total Load Store Mem
         sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.244 0.245 0.245 0% 4397 6595 137 206 466 932 311
Scale: 0.244 0.244 0.244 0% 4408 6612 138 207 465 929 310
Vadd : 0.312 0.312 0.313 0% 5166 6888 161 215 396 1189 297
Triad: 0.310 0.311 0.311 0% 5188 6917 162 216 395 1184 296

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 41943040, Offset = 0
Procs = 40. Per proc = 1024 K
Total memory required = 960.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 109976 microseconds.
   (= 109976 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
40 procs 1024 K elements per proc 10 loop repeats 960 MB memory
8 byte alignment, 0 byte offset
Auto-parallel C
         Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
        time time time Min Stream Total Stream Total Load Store Mem
         sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.126 0.126 0.127 0% 5317 7976 133 199 481 963 321
Scale: 0.125 0.125 0.126 1% 5374 8062 134 202 476 953 318
Vadd : 0.163 0.164 0.164 0% 6162 8215 154 205 415 1246 312
Triad: 0.162 0.162 0.162 0% 6222 8296 156 207 411 1234 309

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 50331648, Offset = 0
Procs = 48. Per proc = 1024 K
Total memory required = 1152.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 120981 microseconds.
   (= 120981 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
48 procs 1024 K elements per proc 10 loop repeats 1152 MB memory
8 byte alignment, 0 byte offset
Auto-parallel C
         Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
        time time time Min Stream Total Stream Total Load Store Mem
         sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.135 0.135 0.136 0% 5961 8942 124 186 515 1031 344
Scale: 0.133 0.133 0.134 1% 6056 9083 126 189 507 1015 338
Vadd : 0.176 0.176 0.177 0% 6861 9148 143 191 448 1343 336
Triad: 0.175 0.175 0.175 0% 6914 9219 144 192 444 1333 333

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 58720256, Offset = 0
Procs = 56. Per proc = 1024 K
Total memory required = 1344.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 138849 microseconds.
   (= 138849 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
56 procs 1024 K elements per proc 10 loop repeats 1344 MB memory
8 byte alignment, 0 byte offset
Auto-parallel C
         Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
        time time time Min Stream Total Stream Total Load Store Mem
         sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.152 0.152 0.153 0% 6183 9274 110 166 580 1159 386
Scale: 0.149 0.150 0.150 1% 6304 9456 113 169 569 1137 379
Vadd : 0.198 0.198 0.199 1% 7131 9508 127 170 503 1508 377
Triad: 0.198 0.198 0.199 1% 7128 9505 127 170 503 1508 377

-------------------------------------------------------------
This system uses 8 bytes per DOUBLE PRECISION word.
-------------------------------------------------------------
Array size = 66060288, Offset = 0
Procs = 63. Per proc = 1024 K
Total memory required = 1512.0 MB.
Each test is run 10 times, but only
the *best* time for each is used.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 161354 microseconds.
   (= 161354 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING: The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
63 procs 1024 K elements per proc 10 loop repeats 1512 MB memory
8 byte alignment, 0 byte offset
Auto-parallel C
         Min RMS Max Max ---Total---- --Per Proc-- ----Time per----
        time time time Min Stream Total Stream Total Load Store Mem
         sec sec sec Range MB/s MB/s MB/s MB/s ns ns ns
Copy : 0.168 0.168 0.170 2% 6307 9461 100 150 639 1279 426
Scale: 0.165 0.166 0.167 1% 6391 9586 101 152 631 1262 421
Vadd : 0.220 0.221 0.221 0% 7203 9604 114 152 560 1679 420
Triad: 0.220 0.221 0.221 0% 7197 9596 114 152 560 1681 420



This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:06 CDT