STREAM submission with Portland Group Compilers on AMD Opteron

From: Mathew COLGROVE (mathew.colgrove@st.com)
Date: Tue Aug 10 2004 - 18:01:54 CDT

  • Next message: Duc Vianney: "Standard and Tuned STREAM on IBM eServer OpenPower 720"

    Hi John,

    Enclosed is an updated result for your STREAM web site. We have
    determined that when using the Portland Group Compilers (PGI) the
    optimal flag set for STREAM is "-O2 -Mvect=sse -Mnontemporal". When
    running the OMP version, add the "-mp" flag. Hopefully, this
    information will help the users of the STREAM benchmark to achieve the
    highest possible results!

    System:

    Model Name: AMD Opteron(tm) Processor 248
    cpu Mhz: 2200
    Motherboard: ASUS SK8N
    Cache Size: 1024 KB
    Memory: 4x512MB, DDR400, PC3200, Corsair, CL2
    Operating System: SuSE 9.0
    Kernel: 2.4.21-102-default

    Compiler:
    The Portland Group (PGI) pgcc Release 5.2-1

    Output:
    % pgcc -O2 -Mvect=sse -Mnontemporal -V second_wall.c stream_d.c -o stream

    pgcc 5.2-1
    Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
    Copyright 2000-2004, STMicroelectronics, Inc. All Rights Reserved.
    second_wall.c:
    PGC/x86-64 Linux/x86-64 5.2-1
    Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
    Copyright 2000-2004, STMicroelectronics, Inc. All Rights Reserved.
    stream_d.c:
    PGC/x86-64 Linux/x86-64 5.2-1
    Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
    Copyright 2000-2004, STMicroelectronics, Inc. All Rights Reserved.

    % stream
    -------------------------------------------------------------
    This system uses 8 bytes per DOUBLE PRECISION word.
    -------------------------------------------------------------
    Array size = 2000000, Offset = 0
    Total memory required = 45.8 MB.
    Each test is run 100 times, but only
    the *best* time for each is used.
    -------------------------------------------------------------
    Your clock granularity/precision appears to be 1 microseconds.
    Each test below will take on the order of 16489 microseconds.
        (= 16489 clock ticks)
    Increase the size of the arrays if this shows that
    you are not getting at least 20 clock ticks per test.
    -------------------------------------------------------------
    WARNING -- The above is only a rough guideline.
    For best results, please be sure you know the
    precision of your system timer.
    -------------------------------------------------------------
    Function Rate (MB/s) RMS time Min time Max time
    Copy: 4302.8156 0.0075 0.0074 0.0078
    Scale: 4251.4326 0.0076 0.0075 0.0077
    Add: 4496.9085 0.0107 0.0107 0.0109
    Triad: 4457.6785 0.0108 0.0108 0.0110

    Sincerely,
    Mathew Colgrove
    QA Engineer
    The Portland Group

    -- 
    ----------------------------------------------------------------------
    Mathew Colgrove - Quality Assurance
    Advanced Compilers and Tools AST Portland Lab, STMicroelectronics
    mathew.colgrove@st.com  (503) 682-2806  (voice) (503) 682-2637  (FAX)
    


    This archive was generated by hypermail 2.1.4 : Wed Aug 11 2004 - 22:32:30 CDT