STREAM Logo (Image)Department of Computer Science
School of Engineering and Applied Science
University of Virginia, Charlottesville, Virginia


What's New!

List of the 30 newest entries!

2016-07-28: Bug fix in stream_mpi.c

I don't like MPI and the version of STREAM written in MPI is not intended as a "standard" version, but there is a bug in the timing code that is too big to ignore. I very carefully derived the proper way to bound the timings for MPI runs that would guarantee that no work would be done before the start time and no work would be done after the end time --- then what I actually implemented in the code was something different (and provably incorrect). Argh!

The new revision of stream_mpi.c has the RCS version number of 1.8 and a modification date of 2016-07-28.

2016-03-28: A bit of clean-up...

Some broken links on the main page were reported to me and I decided to clean up a few things.

2016-01-26: Some older and some newer results....

ScaleMP has submitted new results on the 64-node system they have installed at the University of Queensland. This shared-memory system based on 2-socket Xeon E5-2680v3 nodes delivers over 6 TB/s on the Add and Triad kernels, and reaches the number 3 position on the Top20 list.

Near the opposite end of the spectrum, I realized that I had not published results from the HP Moonshot M300 system that I tested in June 2014. This system uses the Intel C2750 processor (an 8-core Atom-based processor) and delivers up to 15 GB/s.

Even closer to the opposite end of the spectrum, results from the Intel Edison board are included. This board contains a dual-core Atom (Silvermont) processor running at 500 MHz and delivers a single-threaded bandwidth of almost 2 GB/s out of what appears to be a 3.2 GB/s peak (based on my interpretation of the Intel Edison Product Brief (pdf).

2015-07-28: New results, including a new TOP20 leader

SGI has submitted a new record result on the SGI UV3000 system delivering 12.8 TB/s to 13.8 TB/s on the four STREAM kernels. This is a shared-memory system with 256 Intel Xeon E5-4650 v3 (12 core) processors (3072 cores total). All of the cores were activated and available to the OS, but only 1/2 of the cores (1536) were used to run the benchmark. (It is common for multicore systems to slow down slightly when using all cores. At extremely large system scales this effect is likely to get worst due to background OS activity.)

At the other end of the performance spectrum, results from the Raspberry Pi 2 Model B are now published. Both the Fortran and C results show relatively strong results for the STREAM Copy kernel (1700 to 1800 MB/s -- I think that the peak memory bandwidth of the system is 3200 MB/s, but it is hard to get details on the precise frequency used for the LPDDR2 DRAM in the system). The kernels with arithmetic are much slower -- 500 MB/s to 950 MB/s. Presumably this is due to poor support for double-precision floating-point arithmetic in the ARM processor core used, but again it is very hard to find this level of detail on any ARM processor, and the details on the Broadcom BCM2836 chip used in the Raspberry Pi 2 are particularly difficult to find.

I also finally published some results from the TACC Lonestar 4 system (based on the Intel Xeon X5680 ("Westmere EP") processors. Lonestar 4 is about to be replaced by a new system, but I wanted these results to be in the archive. The results are listed under the Dell PowerEdge M610 model name.

Another set of (not-quite-so-) old results is from a single-socket Intel Xeon E3-1270 (Sandy Bridge) system. Please note that these results used only 1 of the 4 cores on the system, as this gave slightly better performance. There is nothing wrong with using fewer cores, but it does mean that the "STREAM Balance" calculation is biased since it compares the peak performance of only 1 core against the memory bandwidth. If all the cores were used, the "STREAM Balance" would be a higher number (more unbalanced) by a factor a slightly more than 4 -- a factor of 4 from counting the peak FLOPS of all the cores and a small additional increase due to the reduction in sustained bandwidth from the extra contention caused by using all four cores.

These last two sets of results are now tagged with "_nta" or "_alloc" after the machine name. I am using this as a temporary measure to indicate whether "streaming" (or "nontemporal" or "cache-bypassing") stores were used in the run (marked by "_nta"). This is discussed the the STREAM FAQ notes on Counting Bytes and FLOPS.

Cleaning up my mailbox, I found a result that I had missed from October 2013. It is an "experimental" result on an overclocked Windows system with an Intel Core i7-3820 processor -- normally 3.6 GHz, but overclocked to 5 GHz. The memory was also overclocked from DDR3/1333 to DDR3/2000. Results are in the screenshots attached to the submission.

2014-10-28: New Variants of STREAM released

Two variants of STREAM have been added to the Versions subdirectory of the source code tree. Both of these versions are fully compliant versions of STREAM (unless there are bugs).

Important notes:

2014-01-28: Some fixes to the database

As I was reworking some analyses, I discovered some errors in the STREAM benchmark database that I corrected:

2013-01-17: STREAM version 5.10 released!

STREAM version 5.10 has finally been released -- at least in the C language. Fortran will follow when I get around to it....

While this version does not change what STREAM measures, it does provide a number of long-awaited features:

2013-01-17: Many older results finally added to tables!

Dr. Bandwidth apparently lost track of many STREAM benchmark submissions in his e-mail. Today many of these older results have been added to the tables. The results have been added to the tables according to the original submission dates, so here is a list of the newly added entries:

2010-04-17: New sorted tables added!NEW!

2010-04-17: Submission Date added to all rows of all tablesNEW!

With the constantly changing nomenclature in the market, it is hard for even me to remember if an entry is an old system or a new one. Now every row on every table has the submission date as the first field. So if seeing all the results sorted by date (see item above) is not what you want, you can still see that critical information in all of the tables.

If you notice systems for which the availability date was significantly earlier than the STREAM benchmark submission date (e.g., old Cray systems), please e-mail any relevant info -- I will be adding availability dates to the database as time permits.

2009-11-28: Back on line & minor updates

Due to a minor administrative error, the STREAM benchmark web site has not displayed the updates that I made since August 2009. This is now fixed and the web site is now correctly displaying the newest updates.
I noticed a minor error in some old results when I added the Fujitsu SPARC Enterprise M8000 and M9000 results. When I added the 2007 results for those two systems, I incorrectly listed the CPU frequency as 2150 MHz rather than the correct value of 2400 MHz. This does not effect the bandwidth numbers, but it does change the peak MFLOPS and Machine Balance calculations. The new computations are reflected in the standard and top20 lists.

2008-12-06: New Windows binaries available

An easy-to-use package with versions for 32-bit and 64-bit Windows and including both single-threaded and OpenMP versions is available: StreamWin-32-64_distro.zip. Let me know if this does not work for you.

2008-12-06: STREAM Variants at Oak Ridge National Labs

The folks at Oak Ridge National Labs have created a fairly large number of variants of the STREAM benchmark for different data types and for different programming languages. From the Extreme Scale Software Center page, there are currently versions of STREAM in several subdirectories:

2008-07-28: New version of STREAM in Java!

Although it is not yet compliant with the STREAM run rules for multi-threaded runs, there is now a very clean (FORTRAN-looking) Java version of the STREAM benchmark in the Contrib area.

What's Not so New Any More!

2007-06-18: New version of STREAM in Pascal!

So how many of you are there are old enough to remember a programming language called "Pascal"? I recall doing a little bit of programming in Pascal using the UCSD Pascal "p-code" integrated development environment on an Apple ][ computer.
A participant in the Free Pascal project donated a Pascal version of STREAM that I have placed in the Contrib area.

2005-02-07: Re-organization of the STREAM ftp site

The Department of Computer Science at the University of Virginia has discontinued anonymous ftp service. So I have moved the stuff that used to only be available by ftp into a new directory that can be accessed by http here.

2004-11-17: Re-organization of the STREAM ftp site and file names

The most recent "official" versions have been renamed "stream.f" and "stream.c" -- all other versions have been moved to the "Versions" subdirectory.
The "official" timer (was "second_wall.c") has been renamed "mysecond.c". This is embedded in the C version ("stream.c"), but still needs to be externally linked to the FORTRAN version ("stream.f").

2004-11-17: New "Getting Started" section in FAQ

Due to a number of recent requests for help in getting started, I have renamed the files and added some new "Getting Started" instructions in the FAQ.

New Opteron Binary

A STREAM binary for Opteron/Athlon64 has been submitted to the STREAM web site. It is reported to work on SUSE, Red Hat, and Fedora. Give it a try, and let me know if it works!
Here is the link to the submission.

2003-09-23: Minor change in reporting for MPI runs

After reviewing several requests, I have decided to allow a subset of MPI results to be published in the "standard" table. The specific case that I allow is when the MPI version of the code is used to run the STREAM benchmark on a single SMP system.

I do not require that all of the MPI processes run under a single O/S, only that the hardware supports shared memory operation that would allow a single O/S to be run if so configured. These "MPI on SMP" results are marked with the notation "(MPI Result)" on the standard benchmark page, and these results are eligible for inclusion in the "TOP20" list.

For completeness, these results will also continue to be included in the MPI table.

Feel free to flame me if you think that this is a bad idea. e-mail flames here

2002-12-04: Estimating Sustainable Bandwidth using SPEC CPU2000 Results

I have written up my methodology for estimating STREAM Triad bandwidth using published results from the 171.swim benchmark in SPEC's CPU2000 suite. These are not publishable STREAM results, but they definitely give a good rough estimate on a wide variety of systems.

2002-12-04: Estimating SPECfp_rate2000 using Peak GFLOPs and STREAM Triad (PowerPoint presentation)

This is a slightly updated version of a presentation that I made at the TOP500 meeting at SuperComputing2002 in November. The slides show a methodology for getting reasonably accurate estimates of SPECfp_rate2000 using only one measurement, plus a set of architectural parameters of the system.

2002-11-05: A new category of results has been defined!

In response to popular demand, a new category has been created explicitly for "tuned" results. This category allows source code modifications to increase performance, including assembly language coding.

2002-11-05: The "Top10" list has been extended to the "Top20" list.

"Top20" list is drawn from both the "standard" and "tuned" results. This provides a "show-off" category to allow folks to show off the best realizable performance on their systems, similar to the LINPACK NxN (aka LINPACK HPC) benchmark.

2002-06-25: An MPI version of STREAM has been released.

Check it out.

2002-02-25: An OpenMP version is available in C!

Version 4 of the STREAM benchmark in C is now available with OpenMP directives. This version still does not have the error-checking or anti-optimization code that is contained in version 5 (still FORTRAN only), but it does allow shared-memory parallel execution on many systems!

2000-07-30: Version 5 of the STREAM benchmark is ready!

Version 5 of the STREAM benchmark is available for FORTRAN users.

New features include:

2000-04-18: All e-mail contributions caught up!

I am still working on adding links from the original submissions to the entries in the tables.  All are complete for the submissions in 1996 and 2000, and I am working toward the middle.....

I also switched from 68k Linux to PowerPC Linux for my home machine, so I have a much more complete set of tools.  Look for STREAM listings for the PowerComputing PowerCurve 601/120.

2000-04-18: FAQ Revised and Updated!

The STREAM FAQ has been revised and updated with a much more useful discussion of the STREAM benchmark run rules. Check it out!

Really OLD News

1999.11.10: e-mail updates

I have updated the hypermail archives of the original submissions of STREAM results. Not everything has made it into the tables yet, but I am slowly catching up.

1999.10.29: New Categories for PC's and Mac's

I have added tables for "standard" results for PC's (and compatibles) and Macintosh's (and compatibles).  "Non-standard/experimental" results and "32-bit" results are still in the corresponding tables -- these tables are small enough that you can find the Mac and PC results easily enough.

1999.10.29: Links to Original Data

I am beginning to add links from the data pages back to the original e-mail submissions (in hypermail format).  Unfortunately, this is an entirely manual process, so it is going to take a while.  I also have the problem that I cannot get the new version of hypermail to compile on my AIX machine, so it will probably be slow going until I get Linux up on my Mac (68k) at home.

1999.10.18: I'm back!

STREAM's author and maintainer (that would be me) is back. I just moved from Silicon Valley to Austin, TX, and have had a bit of trouble getting all my computers and accounts configured.
As of Monday, October 18, 1999, STREAM is back in business, and I will be happy to accept results and will get them in the tables promptly.

1999.10.18: AMD Athlon (aka K7) Results!!!

The first results from the new AMD Athlon (aka K7) are in, courtesy of Claus Jeppesen (jeppesen@mrl.ucsb.edu). This machine is a screamer, posting over 450 MB/s on three of the four kernels! Watch out Intel!

Summer 1999: New Reporting Rules for Systems with Multiple Independent Memory Subsystems!!!!

A Change of Categories for Some Multiprocessor Results: A "partially depopulated" system is one in which only a subset of the cpus are used for the benchmark, and for which this subset is spread around the machine to decrease contention. For example on the SGI Origin2000, each node has 2 cpus sharing a single bus and memory subsystem. The results in the "experimental/nonstandard" table labelled "1 per node" are based on using only one cpu per node board, and are considered a "nonstandard" way of using the machine. Similarly, the Sun Ultra10000 has 4 cpus per node board, so results using 1, 2, or 3 cpus per node also go into this table of "nonstandard" results.

From this point forward, "partially populated" results will no longer be included in the table of "standard" results -- only "fully populated" multiprocessor results will be considered "standard". "Partially populated" results are still welcome, but will be placed in the "experimental/nonstandard" tables, which have been renamed (from simply "experimental") to more clearly denote the non-standard use of the hardware.

Summer 1999: Linux Executables for IA32

There are some new contributed sources and binaries in the Intel directory.

The linux executables run great on Pentium II & Pentium III machines. We are still a bit puzzled by the Windows NT Server results. Any gurus out there willing to help out?

1999.10.18: Mason Cabot and I figured out that the poor results under Windows NT were the result of a misconfiguration on my part. My Linux boxes all had the full complement of SIMM's installed, while the NT machines had the minimum configuration. Apparently the Intel chipset used in this box (the SGI 1400) requires a fully loaded memory system to reach full bandwidth.
 

Mac PowerPC executable set

A set of Mac executables is available now in Stuff-ed format. Get your own copy.
 


An Old DOS version

Dennis Lee has revised a version of STREAM for DOS machines. The executable and a README file are available in a zip file.

Mailing List Archive

I have created a mailing list to discuss future developments of STREAM and memory bandwidth benchmarking in general. The archives of the mailing list are here in hypermail format.

Hypermail archives of old e-mail

Thanks to some friendly pointers to the right tools, the five years of contributions to the STREAM archive is now available for surfing. Go here for whatever details exist on machine details, compilers, compiler options, etc....

New Results in 1996:

May 12, 1996:

There has been a complete change in the underlying structure of the database. The entire database of raw results is now contained in a single, comma-delimited datafile. This is described in the Table subdirectory.
John D. McCalpin mccalpin@cs.virginia.edu

Valid HTML 4.0 Transitional