The STREAM2 Home Page

Introduction

STREAM2 is an attempt to extend the functionality of the STREAM benchmark in two important ways:
  • STREAM2 measures sustained bandwidth at all levels of the cache hierarchy, and
  • STREAM2 more clearly exposes the performance differences between reads and writes

  • STREAM2 is based on the same ideas as STREAM, but uses a different set of vector kernels:

  • FILL:        similar to bzero(), but fills with a constant instead of zero
  • COPY:        similar to bcopy(), and the same as STREAM Copy
  • DAXPY:    similar to STREAM Triad, but overwrites one of the input vectors instead of writing results to a third vector
  • SUM:        sum reduction on a single vector -- reads only, no writes
  • Kernel Code
    Bytes/iter
    read
    Bytes/iter
    written
    FLOPS/iter
    FILL a(i) = q
    0     (+8)
    8
    0
    COPY a(i) = b(i)
    8     (+8)
    8
    0
    DAXPY a(i) = a(i) + q*b(i)
    16
    8
    2
    SUM sum = sum + a(i)
    8
    0
    1
    Table 1: Characteristics of the STREAM2 kernels.  The value in parentheses in the "Bytes/iter read" column indicates the number of additional bytes read per iteration on machines with a "write allocate" cache policy.


    Source Code

    The STREAM2 source code  is provided in Fortran77 -- you are welcome to translate it to C, but I have not gotten around to it yet.   The control flow is a bit more complex than STREAM because of the looping over multiple iterations of many different vector lengths.

    The main feature is that the same number of work is done for each vector length, so the shorter vector lengths are iterated many times and the longer vector lengths fewer times.
     

    Sample Results

    Here are some sample results off of machines in my house and office.  The machines listed are described in Table 2.
     
    Machine CPU MHz L1
    Data 
    Cache
    L2
    Data Cache
    Peak
    L2 cache
    Bandwidth
    bus 
    width @ speed
    Peak
    Memory
    Bandwidth
    IBM RS/6000-397 POWER2-SC 160 128kB @ 160 MHz none N/A 256 bits @ 80 MHz 2560 MB/s
    Upgraded Mac clone PowerPC G3 367.5 32kB @ 367.5 MHz 512 kB @ 183.75 MHz 2940 MB/s 64 bits @ 52.5 MHz 420 MB/s
    PowerComputing
    PowerCurve 601/120
    PowerPC 601 120 64kB (I+D) @ 120 MHz 256kB @ 40 MHz 320 MB/s 64 bits @ 40 MHz 320 MB/s
    Mac Quadra 650 Motorola 68040 33 8 kB @ 33 MHz none N/A 32 bits @ 33 MHz 132 MB/s

    FILL



    COPY



    DAXPY


    SUM