STREAM results (Fujitsu SPARC M10-4, 64 cores)

From: Akihiro SENOO <senoo.akihiro@jp.fujitsu.com>
Date: Tue Sep 17 2013 - 02:59:46 CDT

Dear Dr. McCalpin,

We have measured STREAM benchmark on "Fujitsu SPARC M10-4".
Please update the STREAM Web site.

         System Name: Fujitsu SPARC M10-4
            CPU Name: SPARC64 X
             CPU MHz: 2800
      CPU(s) enabled: 64 cores, 4 chips, 16 cores/chip, 2 threads/core
       Primary Cache: 64 KB I + 64 KB D on chip per core
     Secondary Cache: 24 MB I+D on chip per chip
            L3 Cache: None
         Other Cache: None
              Memory: 512 GB (32 x 16 GB 2Rx4 PC3L-12800R-11, ECC)
    Operating System: Solaris 11.1.10.5.0
            Compiler: C/C++/Fortran: Version 12.3 of Oracle Solaris
                      Studio, 1/13 Platform Specific Enhancement
   Compilation Flags: -fast -m64 -xopenmp -xtarget=sparc64x
                      -fma=fused -xipo=2 -xpagesize=4M -xlinkopt
                      -xvector -xprefetch_level=3 -xprefetch=latx:8.0
                      -Qoption cg -Qlp-dl=1,-Qms_pipe-prefdl=1
                      -xtypemap=integer:64
  STREAM Source Code: Fortran version (v5.6) with format changes
                      for large arrays.
         OS Settings: (/etc/system parameters) lpg_alloc_prefer=1
   Shell Environment: OMP_NUM_THREADS=128
                      SUNW_MP_PROCBIND="0-127"
                      SUNW_MP_THR_IDLE=spin
                      LD_PRELOAD=madv.so.1
                      LD_PRELOAD_64=madv.so.1
                      MADV=access_lwp
                 Run: ppgsz -o heap=256M,stack=256M,anon=256M <stream>

Outputs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 2560000000
 Offset = 1024
 The total memory requirement is 58593 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads = 128
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
(snip)
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 224657.4430 0.1878 0.1823 0.2280
Scale: 229113.4945 0.1792 0.1788 0.1809
Add: 258814.5638 0.2392 0.2374 0.2484
Triad: 259134.4198 0.2379 0.2371 0.2390
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

--
Akihiro SENOO
PA PROJECT
NEXT GENERATION TECHNICAL COMPUTING UNIT
FUJITSU Limited
senoo.akihiro@jp.fujitsu.com


Received on Tue Sep 17 08:06:23 2013

This archive was generated by hypermail 2.1.8 : Tue Sep 17 2013 - 08:23:52 CDT