Re: STREAM results (Fujitsu SPARC M10-4S, 1024 cores)

From: Akihiro SENOO <senoo.akihiro@jp.fujitsu.com>
Date: Tue Mar 26 2013 - 19:15:08 CDT

Dear Dr. McCalpin,

We are ready to publish the results of Fujitsu SPARC M10-4S stream score.
Please update the STREAM web site.

         System Name: Fujitsu SPARC M10-4S
            CPU Name: SPARC64 X
             CPU MHz: 3000
      CPU(s) enabled: 1024 cores, 64 chips, 16 cores/chip, 2 threads/core
       Primary Cache: 64 KB I + 64 KB D on chip per core
     Secondary Cache: 24 MB I+D on chip per chip
            L3 Cache: None
         Other Cache: None
              Memory: 8320 GB (520 x 16 GB)
                      chip#0: 256 GB (16 x 16 GB 2Rx4 PC3L-12800R-CL11,
                      ECC, running at 1333 MHz)
                      chip#1-#63: 8064 GB (504 x 16 GB 2Rx4
                      PC3L-12800R-CL11, ECC, running at 1600 MHz)
    Operating System: Oracle Solaris 11.1
            Compiler: C/C++: Version 12.3 of Oracle Solaris Studio,
                      1/13 Platform Specific Enhancement
   Compilation Flags: -fast -m64 -xopenmp -xtarget=sparc64x
                      -fma=fused -xipo=2 -xpagesize=4M -xlinkopt
                      -xvector -xprefetch_level=3 -xprefetch=latx:8.0
                      -Qoption cg -Qlp-dl=1,-Qms_pipe-prefdl=1
                      -xtypemap=integer:64
  STREAM Source Code: Fortran version (v5.6) with format changes
                      for large arrays.
         OS Settings: (/etc/system parameters) lpg_alloc_prefer=1
                      (change processor status) psradm -f 0-31
   Shell Environment: OMP_NUM_THREADS=2016
                      SUNW_MP_PROCBIND="0-2015"
                      SUNW_MP_THR_IDLE=spin
                      LD_PRELOAD=madv.so.1
                      LD_PRELOAD_64=madv.so.1
                      MADV=access_lwp
                 Run: ppgsz -o heap=256M,stack=256M,anon=256M <stream>

Outputs:
----------------------------------------------
 Double precision appears to have 16 digits of accuracy
 Assuming 8 bytes per DOUBLE PRECISION word
----------------------------------------------
 ----------------------------------------------
 STREAM Version $Revision: 5.6 $
 ----------------------------------------------
 Array size = 40960000000
 Offset = 1024
 The total memory requirement is 937500 MB
 You are running each test 10 times
 --
 The *best* time for each test is used
 *EXCLUDING* the first and last iterations
 ----------------------------------------------
 Number of Threads = 2016
 ----------------------------------------------
 Printing one line per active thread....
 Printing one line per active thread....
(snip)
 Printing one line per active thread....
 Printing one line per active thread....
 ----------------------------------------------------
 Your clock granularity/precision appears to be 1 microseconds
 ----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 3474998.0651 4.2926 0.1886 10.1007
Scale: 3500799.8989 3.1858 0.1872 9.8144
Add: 3956102.3998 3.5077 0.2485 10.7638
Triad: 4002703.2472 2.4675 0.2456 10.1980
 ----------------------------------------------------
 Solution Validates!
 ----------------------------------------------------

--
Akihiro SENOO
PA PROJECT
NEXT GENERATION TECHNICAL COMPUTING UNIT
FUJITSU Limited
senoo.akihiro@jp.fujitsu.com
Received on Tue Mar 26 19:59:08 2013

This archive was generated by hypermail 2.1.8 : Wed Mar 27 2013 - 10:00:49 CDT