This set of results includes the top 20 shared-memory systems
(either "standard" or "tuned" results), ranked by STREAM TRIAD
performance. Like the LINPACK NxN benchmark, this is intended
to show off the best possible bandwidth of these large systems.
The results are currently presented in the following
tables:
The "standard" set of results presents the results of the C or Fortran
versions of the STREAM benchmark running with 64-bit data types on production
hardware. The "standard" set of results excludes:
Cases with 32-bit operands or operations.
Cases with significantly modified code or assembly language.
Simulated results.
Results from experimental or non-production hardware or software.
The results are currently presented in the following tables:
The "tuned" set of results presents the results of the
STREAM benchmark running with 64-bit data types on production
hardware, but allows code modification (including assembly
language coding). The "tuned" set of results excludes:
Cases with 32-bit operands or operations.
Simulated results.
Results from experimental or non-production hardware or software,
or non-standard system configurations.
The results are currently presented in the following tables:
Note: A "partially depopulated" system is one in which only a subset of the
cpus are used for the benchmark, and for which this subset is spread around
the machine to decrease contention. For example on the SGI Origin2000,
each node has 2 cpus sharing a single bus and memory subsystem. The results
in this table labelled "1 per node" are based on using only one cpu per
node board, and are considered a "nonstandard" way of using the machine.
Similarly, the Sun Ultra10000 has 4 cpus per node board, so results using
1, 2, or 3 cpus per node also go into this table of "nonstandard" results.
The results are currently presented in the following tables:
These results are based on the MPI code -- typically applied to clusters.
There is nothing "non-standard" about these numbers -- I just wanted a
way to try to keep the SMP and cluster results separated.
These tables include only results with 32-bit operands or operations.
Results using 64-bit operands that move the data in 32-bit "chunks" are
not
here. The results are currently presented in the following tables: