STREAM results on a dual Opteron 246

From: Paul Saxe (PSaxe@MaterialsDesign.com)
Date: Mon May 09 2005 - 23:48:27 CDT

  • Next message: Schmidt, David (Performance Eng.): "STREAM Results for the HP ProLiant DL585 and HP ProLiant BL45p with 852 Opteron CPUs"
    Dear Dr. McCalpin,

    I ran the 01-stream_d_c_omp_x86_64 executable from your 2004-11-17 "New Opteron Binary" note on a dual processor Opteron 246 (2 GHz) machine with 2 GB PC3200 memory, a Tyan 2882 motherboard and running Fedora 2 for the AMD64. Perhaps these results are of use to you -- they seem to be considerably different than similar ones on the web site.

    If you need any other information or would like me to do anything else, please let me know.

    Paul Saxe.


    [psaxe@opteron1 psaxe]$ export OMP_NUM_THREADS=1
    [psaxe@opteron1 psaxe]$ ./01*
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 20000000, Offset = 0
    Total memory required = 457.8 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 507503 microseconds.
       (= 507503 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function      Rate (MB/s)   RMS time     Min time     Max time
    Copy:        4163.9075       0.0769       0.0769       0.0772
    Scale:       4326.7150       0.0740       0.0740       0.0742
    Add:         4271.1491       0.1125       0.1124       0.1126
    Triad:       4339.7302       0.1107       0.1106       0.1108


    [psaxe@opteron1 psaxe]$ export OMP_NUM_THREADS=2
    [psaxe@opteron1 psaxe]$ ./01*
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 20000000, Offset = 0
    Total memory required = 457.8 MB.
    Each test is run 10 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 256726 microseconds.
       (= 256726 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function      Rate (MB/s)   RMS time     Min time     Max time
    Copy:        8052.1296       0.0398       0.0397       0.0402
    Scale:       8491.1922       0.0378       0.0377       0.0381
    Add:         8369.8108       0.0574       0.0573       0.0578
    Triad:       8431.9977       0.0570       0.0569       0.0570



    This archive was generated by hypermail 2.1.4 : Mon Jun 13 2005 - 08:58:09 CDT