New World Record 512p STREAM result on NCSA's Altix

From: Nahil Sobh (sobh@ncsa.uiuc.edu)
Date: Wed Dec 22 2004 - 10:41:07 CST


Dear John,

I have included our latest STREAM benchmark data for
NCSA's SGI-Altix Machine to update your STREAM ranking
list.

The machine is partitioned into two single-image systems
with 512p per system.

Peak Performance per partition: over 3 Tflops.
                               Processor: Intel Itanium 2 processors. 1.6 GHz 9 MB cache.
                                         OS: Linux
One partition has 1 terabytes of globally accessible memory while
the other partition has 2 terabytes of globally accessible memory

Over 370 terabytes of SGI InfiniteStorage.

These runs were done by John Baron and Kevin McMahon from SGI.

The code was compiled -O3 -i8 -openmp -mP2OPT_hlo_prefetch=F.
The latter flag disables some unnecessary prefetches that the
compiler was generating.
Build 20040901 of the Intel 7.1 compiler was used.

====================================================
SGI Altix 3700 Bx2
512p 1.6GHz 9M

----------------------------------------------
   Double precision appears to have 16 digits of accuracy
   Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
   Array size = 3290112000
   Offset = 4351
   The total memory requirement is 75304 MB
   You are running each test 20 times
   --
   The *best* time for each test is used
   *EXCLUDING* the first and last iterations
   ----------------------------------------------------
   Your clock granularity appears to be less than one microsecond
   Your clock granularity/precision appears to be 1 microseconds
   ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 906388.1661 0.0582 0.0581 0.0586
Scale: 870211.2240 0.0608 0.0605 0.0613
Add: 1055179.1819 0.0750 0.0748 0.0753
Triad: 1119912.9516 0.0711 0.0705 0.0713
   ----------------------------------------------------
   Solution Validates!
   ----------------------------------------------------

This was run using OMP_NUM_THREADS=510 and using dplace to bind the threads
to cpus 2-511.

The "tuned" version of the STREAM code was used; aside from some minor
formatting changes to handle all the digits for the large rates, the kernels
were modified using the following compiler prefetch intrinsics:

            subroutine stream_copy (c, a, n)
            real*8 c(*), a(*)
!$OMP PARALLEL DO
            do j = 1,n
               if (iand(j,15_8) .eq. 0) then
                  call lfetch_excl_nta(c(j+190))
                  call lfetch_nta(a(j+190))
               end if
               c(j) = a(j)
            end do
            end

            subroutine stream_scale (b, c, scalar, n)
            real*8 b(*), c(*), scalar
!$OMP PARALLEL DO
            do j = 1,n
               if (iand(j,15_8) .eq. 0) then
                  call lfetch_excl_nta(b(j+190))
                  call lfetch_nta(c(j+190))
               end if
                b(j) = scalar*c(j)
            end do
            end

            subroutine stream_add (c, a, b, n)
            real*8 c(*), a(*), b(*)
!$OMP PARALLEL DO
            do j = 1,n
               if (iand(j,15_8) .eq. 0) then
                  call lfetch_nta(b(j+190))
                  call lfetch_nta(a(j+190))
                  call lfetch_excl_nta(c(j+190))
               end if
               c(j) = a(j) + b(j)
            end do
            end

            subroutine stream_triad (a, b, c, scalar, n)
            real*8 a(*), b(*), c(*), scalar !$OMP PARALLEL DO
            do j = 1,n
               if (iand(j,15_8) .eq. 0) then
                  call lfetch_nta(b(j+190))
                  call lfetch_nta(c(j+190))
                  call lfetch_excl_nta(a(j+190))
               end if
               a(j) = b(j) + scalar*c(j)
            end do
            end

Nahil A. Sobh, Ph.D.
Senior Research Scientist and Group Leader
National Center for Supercomputing Applications (NCSA)
152 Computing Applications Building
605 East Springfield Avenue
Champaign, IL 61820
(217) 244 9481(office) 244 6400(Sec.) 244 6829(Fax)



This archive was generated by hypermail 2.1.4 : Sat Dec 25 2004 - 08:35:19 CST