STREAM results (Fujitsu SPARC M10-4S, 256 cores)

From: Akihiro SENOO <senoo.akihiro@jp.fujitsu.com>
Date: Tue Mar 26 2013 - 19:15:55 CDT

Dear Dr. McCalpin,

We have measured STREAM benchmark on "Fujitsu SPARC M10-4S".
Please update the STREAM Web site.

         System Name: Fujitsu SPARC M10-4S
            CPU Name: SPARC64 X
             CPU MHz: 3000
      CPU(s) enabled: 256 cores, 16 chips, 16 cores/chip, 2 threads/core
       Primary Cache: 64 KB I + 64 KB D on chip per core
     Secondary Cache: 24 MB I+D on chip per chip
            L3 Cache: None
         Other Cache: None
              Memory: 2 TB (128 x 16GB 2Rx4 PC3L-12800R-CL11, ECC,
                      running at 1600 MHz)
    Operating System: Oracle Solaris 11.1
            Compiler: C/C++: Version 12.3 of Oracle Solaris Studio,
                      1/13 Platform Specific Enhancement
   Compilation Flags: -fast -m64 -xopenmp -xtarget=sparc64x
                      -fma=fused -xipo=2 -xpagesize=4M -xlinkopt
                      -xvector -xprefetch_level=3 -xprefetch=latx:8.0
                      -Qoption cg -Qlp-dl=1,-Qms_pipe-prefdl=1
                      -xtypemap=integer:64
  STREAM Source Code: Fortran version (v5.6) with format changes
                      for large arrays.
         OS Settings: (/etc/system parameters) lpg_alloc_prefer=1
   Shell Environment: OMP_NUM_THREADS=512
                      SUNW_MP_PROCBIND="0-511"
                      SUNW_MP_THR_IDLE=spin
                      LD_PRELOAD=madv.so.1
                      LD_PRELOAD_64=madv.so.1
                      MADV=access_lwp
                 Run: ppgsz -o heap=256M,stack=256M,anon=256M <stream>

Outputs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 2560000000
 Offset = 1024
 The total memory requirement is 58593 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads = 512
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
(snip)
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 890337.8015 0.2480 0.0460 1.8610
Scale: 904156.0541 0.4489 0.0453 1.8919
Add: 1020663.0905 0.2840 0.0602 2.0707
Triad: 1023643.0281 0.2146 0.0600 1.4465
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

--
Akihiro SENOO
PA PROJECT
NEXT GENERATION TECHNICAL COMPUTING UNIT
FUJITSU Limited
senoo.akihiro@jp.fujitsu.com

Received on Tue Mar 26 19:59:08 2013

This archive was generated by hypermail 2.1.8 : Wed Mar 27 2013 - 10:00:40 CDT