Department
of Computer Science
School of Engineering and
Applied Science
University of Virginia, Charlottesville,
Virginia
What's New!
2007-06-18: New version of STREAM in Pascal!
So how many of you are there are old enough to remember a programming
language called "Pascal"? I recall doing a little bit of programming
in Pascal using the UCSD Pascal "p-code" integrated development
environment on an Apple ][ computer.
A participant in the
Free Pascal
project donated a Pascal version of STREAM that I have placed in the
Contrib
area.
2005-02-07: Re-organization of the STREAM ftp site
The Department of Computer Science at the University of Virginia has
discontinued anonymous ftp service. So I have moved the stuff that
used to only be available by ftp into a new directory that can be
accessed by http here.
2004-11-17: Re-organization of the STREAM ftp site and file names
The most recent "official" versions have been renamed "stream.f" and
"stream.c" -- all other versions have been moved to the "Versions"
subdirectory.
The "official" timer (was "second_wall.c") has been renamed "mysecond.c".
This is embedded in the C version ("stream.c"), but still needs to be
externally linked to the FORTRAN version ("stream.f").
2004-11-17: New "Getting Started" section in FAQ
Due to a number of recent requests for help in getting started, I
have renamed the files and added some new "Getting Started"
instructions in the FAQ.
New Opteron Binary
A STREAM binary for Opteron/Athlon64 has been submitted to the
STREAM web site. It is reported to work on SUSE, Red Hat, and
Fedora. Give it a try, and let me know if it works!
Here is the
link to the submission.
2003-09-23: Minor change in reporting for MPI runs
After reviewing several requests, I have decided to allow a subset
of MPI results to be published in the "standard" table. The
specific case that I allow is when the MPI version of the code is
used to run the STREAM benchmark on a single SMP system.
I do not require that all of the MPI processes run under a single O/S,
only that the hardware supports shared memory operation that would
allow a single O/S to be run if so configured. These "MPI on SMP"
results are marked with the notation "(MPI Result)" on the standard
benchmark page, and these results are eligible for inclusion in the
"TOP20" list.
For completeness, these results will also continue
to be included in the MPI table.
Feel free to flame me if you think that this is a bad idea.
e-mail flames here
I have written up my methodology for estimating STREAM Triad bandwidth using
published results from the 171.swim benchmark in SPEC's CPU2000 suite. These
are not publishable STREAM results, but they definitely give a good rough
estimate on a wide variety of systems.
This is a slightly updated version of a presentation that I made at
the TOP500 meeting at SuperComputing2002 in November. The slides
show a methodology for getting reasonably accurate estimates of
SPECfp_rate2000 using only one measurement, plus a set of architectural
parameters of the system.
2002-11-05: A new category of results has been defined!
In response to popular demand, a new category has been created explicitly for "tuned" results.
This category allows source code modifications to increase performance, including assembly language coding.
2002-11-05: The "Top10" list has been extended to the "Top20" list.
"Top20" list is drawn from
both the "standard" and "tuned" results. This provides a "show-off" category to
allow folks to show off the best realizable performance on their
systems, similar to the LINPACK NxN (aka LINPACK HPC) benchmark.
2002-06-25: An MPI version of STREAM has been released.
Check it out.
2002-02-25: An OpenMP version is available in C!
Version 4 of the STREAM benchmark in C is now available with OpenMP directives.
This version still does not have the error-checking or anti-optimization code
that is contained in version 5 (still FORTRAN only), but it does allow shared-memory
parallel execution on many systems!
2000-07-30: Version 5 of the STREAM benchmark is ready!
Version 5 of the STREAM benchmark is available for FORTRAN users.
New features include:
OpenMP directives for parallel execution on many shared-memory machines
Added a new subroutine to validate results
Added modifications to array elements to prevent aggressive optimizers
from moving the timer calls. (This is a problem with IBM's XLF 7.1 when
using prior versions of STREAM.)
Switched from RMS time to AVG time in output. (Min time is still
used to calculate the bandwidths.)
STREAM now excludes the first and last iterations when looking for
minimum times.
2000-04-18: All e-mail contributions caught up!
I am still working on adding links from the original submissions to the
entries in the tables. All are complete for the submissions in 1996
and 2000, and I am working toward the middle.....
I also switched from 68k Linux to PowerPC Linux for my home machine,
so I have a much more complete set of tools. Look for STREAM listings
for the PowerComputing PowerCurve 601/120.
2000-04-18: FAQ Revised and Updated!
The STREAM FAQ has been revised and updated with
a much more useful discussion of the STREAM benchmark run rules. Check
it out!
What's Not so New Any More!
99.11.10: e-mail updates
I have updated the hypermail archives
of the original submissions of STREAM results. Not everything has made
it into the tables yet, but I am slowly catching up.
99.10.29: New Categories for PC's and Mac's 
I have added tables for "standard" results for PC's (and compatibles) and
Macintosh's (and compatibles). "Non-standard/experimental" results
and "32-bit" results are still in the corresponding tables -- these tables
are small enough that you can find the Mac and PC results easily enough.
99.10.29: Links to Original Data 
I am beginning to add links from the data pages back to the original e-mail
submissions (in hypermail format). Unfortunately, this is an entirely
manual process, so it is going to take a while. I also have the problem
that I cannot get the new version of hypermail to compile on my AIX machine,
so it will probably be slow going until I get Linux up on my Mac (68k)
at home.
99.10.18: I'm back!
STREAM's author and maintainer (that would be me)
is back. I just moved from Silicon Valley to Austin, TX, and have had a
bit of trouble getting all my computers and accounts configured.
As of Monday, October 18, 1999, STREAM is back in business, and I will
be happy to accept results and will get them in the tables promptly.
99.10.18: AMD Athlon (aka K7) Results!!!
The first results from the new AMD Athlon (aka K7) are in, courtesy of
Claus Jeppesen (jeppesen@mrl.ucsb.edu). This machine is a screamer, posting
over 450 MB/s on three of the four kernels! Watch out Intel!
Summer 1999: New Reporting Rules for Systems with Multiple Independent
Memory Subsystems!!!!
A Change of Categories for Some Multiprocessor Results: A "partially
depopulated" system is one in which only a subset of the cpus are used
for the benchmark, and for which this subset is spread around the machine
to decrease contention. For example on the SGI Origin2000, each node has
2 cpus sharing a single bus and memory subsystem. The results in the "experimental/nonstandard"
table labelled "1 per node" are based on using only one cpu per node board,
and are considered a "nonstandard" way of using the machine. Similarly,
the Sun Ultra10000 has 4 cpus per node board, so results using 1, 2, or
3 cpus per node also go into this table of "nonstandard" results.
From this point forward, "partially populated" results will no longer
be included in the table of "standard" results -- only "fully populated"
multiprocessor results will be considered "standard". "Partially populated"
results are still welcome, but will be placed in the "experimental/nonstandard"
tables, which have been renamed (from simply "experimental") to more clearly
denote the non-standard use of the hardware.
Summer 1999: Linux Executables for IA32
There are some new contributed sources and binaries in the
MasonCabot directory.
The linux executables run great on Pentium II & Pentium III machines.
We are still a bit puzzled by the Windows NT Server results. Any gurus
out there willing to help out?
99.10.18: Mason Cabot and I figured
out that the poor results under Windows NT were the result of a misconfiguration
on my part. My Linux boxes all had the full complement of SIMM's installed,
while the NT machines had the minimum configuration. Apparently the Intel
chipset used in this box (the SGI 1400) requires a fully loaded memory
system to reach full bandwidth.
Mac PowerPC executable set
A set of Mac executables is available now in Stuff-ed format. Get your
own
copy.
Really OLD News
An Old DOS version
Dennis Lee has revised a version of STREAM for DOS machines. The executable
and a README file are available in a zip
file.
Mailing List Archive
I have created a mailing list to discuss future developments of STREAM
and memory bandwidth benchmarking in general. The archives of the mailing
list are here in hypermail format.
Hypermail archives of old e-mail
Thanks to some friendly pointers to the right tools, the five years of
contributions to the STREAM archive is now available for surfing. Go
here
for whatever details exist on machine details, compilers, compiler options,
etc....
New Results in 1996:
-
Mon Dec 16, 10:58am GMT 1996 --- Dell P6-200 (Natoma chipset)

-
Tue Dec 10, 3:01pm 1996 --- Apple PowerMac 7500 w/ 150 MHz PPC604e

-
Mon Dec 9, 3:18pm EST 1996 --- PowerCenter 120 (Mac Clone)

-
Sat Dec 7, 2:35am 1996 --- HP C180

-
Tue Nov 26, 10:52am 1996 --- IBM RS/6000-43P

-
Tue Nov 26, 10:52am 1996 --- IBM RS/6000-25T

-
Sat Nov 23 1:24am EST 1996 --- Apple PowerMac 7500/100

-
Fri Nov 22 16:32:58 EST 1996 --- Cray T932, 1-32 cpus
-
Fri Nov 22 16:32:58 EST 1996 --- Sun Ultra II, 1-2 cpus
-
Fri Nov 22 16:32:58 EST 1996 --- Compaq Proliant 5000
-
Fri Nov 22 16:32:58 EST 1996 --- Gateway 2000 P6/200
-
Fri Nov 22 16:32:58 EST 1996 --- HP/9000-735/125
-
Fri Nov 22 16:32:58 EST 1996 --- HP/9000-712/80
-
Mon Oct 7 09:43:46 EDT 1996 --- SGI Origin 2000, 1-32 cpus
-
Mon Oct 7 09:43:46 EDT 1996 --- SGI Origin 200, 1-2 cpus
-
Mon Oct 7 09:43:46 EDT 1996 --- Convex Exemplar S, 6-16 cpus
-
Mon Sep 16 14:56:23 EDT 1996 --- SGI Power Challenge 10000, 1-6 cpus
-
Sat Jul 6 09:33:15 EDT 1996 --- NEC SX-4, 1-4 cpus
-
Mon May 6 14:33:00 EDT 1996 --- DEC 4100/300E, 4100/300, 4100/400
-
Tue Apr 30 18:02:00 EDT 1996 --- DEC 8400-5-350 1-8 cpus
-
Thu Apr 25 11:16:33 EDT 1996 --- Sun Ultra Enterprise 6000 1-32 cpus ---
assembly coded kernels
-
Mon Apr 22 12:13:43 EDT 1996 --- Sun Ultra Enterprise 6000 1-32 cpus
-
Thu Apr 4 10:47:40 EST 1996 --- Asus Pentium 200 and Pentium 180
-
Mon Apr 1 1996 --- Cray J932 1-32 cpus
-
Mon Apr 1 1996 --- Tera MTA 1-16 cpus (simulated 300 MHz results)
-
Thu Mar 28 12:45:42 EST 1996 --- Convex SPP-1600 1-32 cpus
-
Tue Feb 27 10:49:01 EST 1996 --- Sun Ultra I/170 update
-
Thu Jan 11 06:02:04 EST 1996 --- HP 9000/819 (K200)
May 12, 1996:
There has been a complete change in the underlying structure of the database.
The entire database of raw results is now contained in a single, comma-delimited
datafile. This is described in the Table subdirectory.
John D. McCalpin mccalpin@cs.virginia.edu