Re: SPARCstation memory performance???

From: David Robinson (
Date: Fri Oct 08 1993 - 04:49:03 CDT

In article <>, (James C. Hoe) writes:
|> Has anyone done any experiment or have detail knowledge of memory
|> performance on SPARC workstations (4/4*0, SS10). I am particularly
|> interested in load and store on cache misses.
|> I have done some experiments and found my 40MHz SPARCstation2 uses
|> nearly 30 cycles on a load misses (I was told this was due to
|> very slow memory translation). SPARCstation2 also require 8 cycles
|> to store regardless of miss or hit.
|> Have anyone done similar experiments, or know detail inner workings of
|> these workstations, especially SS10's. I would appreciate any
|> correspondence.

For a SS10 with MCC and secondardy cache (10/41, 10/51 etc) the penalties are
Load miss primary cache, hit secondary cache: 5 cycles
Load miss primary cache, miss secondard cache: ~80 cycles

For a SS10 without MCC, specifically 10/40, the penalty is
Load miss primary cache ~15 cycles

I suspect the penalty will be less for a 10/30, as cycles are longer.

Store penalties are harder to measure, and less useful to know, as there is
a four level store buffer. However, I have measured the peak store bandwidth:

Timings done using the std instruction.
All bandwidths in Mb/s
I = on-chip (internal) or primary cache,
E = off-chip (external) or secondary cache
Clock speed given either as CPU speed or CPU speed/memory bus speed

        Hit I cache Miss I cache Miss I cache
                      Hit E cache Miss E cache
10/40 308 - 46 40MHz, I=16Kb, E=0Kb
10/51 220 212 34 50MHz/40MHz, I=16Kb, E=1Mb
10/41 182 172 31 40.3MHz/40Mhz, I=16Kb, E=1Mb
Classic 62 - 64 50Mhz, I=2kb, E=0Kb
SS2 - 25 24 40MHz, I=0Kb, E=64Kb

It is obvious that the 10/41 & 10/51 have a write-through internal cache,
whereas the 10/40 has a write-back internal cache.
For a 10/40, a store this hits the internal cache can be issued every cycle.
For a 10/41 or 51, once the store buffer is filled, stores can be executed
every 1 3/4 cycles, on average.
For the SS2, 24Mb/s translates to 13.0 cycles for each std, or 6.5 cycles per
word, so either storing is pipelined, or 8 cycles only applies to single word

David Robinson. (

From Ukn Oct 14 16:58:35 1993
Received: from by via SMTP (911016.SGI/911001.SGI)
        for mccalpin id AA24426; Thu, 14 Oct 93 16:58:33 -0400
Return-Path: <>
Received: from ( []) by (8.6.beta.11/8.6.beta.2) with SMTP id QAA10316 for <>; Thu, 14 Oct 1993 16:56:30 -0400
Received: from by (4.1/SMI-4.1)
        id AA07184; Thu, 14 Oct 93 13:56:28 PDT
Message-Id: <>
Subject: STREAM benchmark
Date: Thu, 14 Oct 93 13:56:27 -0700
Status: RO

I saw the table you posted a few weeks ago in comp.arch, comparing
achieved memory bandwidths for several machines. Very interesting
numbers. I was wondering whether you've got any data on the new IBM
POWER2 machines. Also, is there any chance I could get a copy of the
codes you used? I'd like to try running the test on a few other
machines. I realize that they are all just 3-line programs, but it
makes direct comparison easier if I know that I'm using the same


Ed Rothberg
Intel Supercomputer Systems Division

This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:03 CDT