In article <JHOE.93Oct7234827@au-bon-pain.lcs.mit.edu>, firstname.lastname@example.org (James C. Hoe) writes:
|> Has anyone done any experiment or have detail knowledge of memory
|> performance on SPARC workstations (4/4*0, SS10). I am particularly
|> interested in load and store on cache misses.
|> I have done some experiments and found my 40MHz SPARCstation2 uses
|> nearly 30 cycles on a load misses (I was told this was due to
|> very slow memory translation). SPARCstation2 also require 8 cycles
|> to store regardless of miss or hit.
|> Have anyone done similar experiments, or know detail inner workings of
|> these workstations, especially SS10's. I would appreciate any
For a SS10 with MCC and secondardy cache (10/41, 10/51 etc) the penalties are
Load miss primary cache, hit secondary cache: 5 cycles
Load miss primary cache, miss secondard cache: ~80 cycles
For a SS10 without MCC, specifically 10/40, the penalty is
Load miss primary cache ~15 cycles
I suspect the penalty will be less for a 10/30, as cycles are longer.
Store penalties are harder to measure, and less useful to know, as there is
a four level store buffer. However, I have measured the peak store bandwidth:
Timings done using the std instruction.
All bandwidths in Mb/s
I = on-chip (internal) or primary cache,
E = off-chip (external) or secondary cache
Clock speed given either as CPU speed or CPU speed/memory bus speed
Hit I cache Miss I cache Miss I cache
Hit E cache Miss E cache
10/40 308 - 46 40MHz, I=16Kb, E=0Kb
10/51 220 212 34 50MHz/40MHz, I=16Kb, E=1Mb
10/41 182 172 31 40.3MHz/40Mhz, I=16Kb, E=1Mb
Classic 62 - 64 50Mhz, I=2kb, E=0Kb
SS2 - 25 24 40MHz, I=0Kb, E=64Kb
It is obvious that the 10/41 & 10/51 have a write-through internal cache,
whereas the 10/40 has a write-back internal cache.
For a 10/40, a store this hits the internal cache can be issued every cycle.
For a 10/41 or 51, once the store buffer is filled, stores can be executed
every 1 3/4 cycles, on average.
For the SS2, 24Mb/s translates to 13.0 cycles for each std, or 6.5 cycles per
word, so either storing is pipelined, or 8 cycles only applies to single word
David Robinson. (email@example.com)
From rothberg@SSD.intel.com Ukn Oct 14 16:58:35 1993
Received: from brahms.udel.edu by perelandra.cms.udel.edu via SMTP (911016.SGI/911001.SGI)
for mccalpin id AA24426; Thu, 14 Oct 93 16:58:33 -0400
Received: from SSD.intel.com (ssd.intel.com [22.214.171.124]) by brahms.udel.edu (8.6.beta.11/8.6.beta.2) with SMTP id QAA10316 for <firstname.lastname@example.org>; Thu, 14 Oct 1993 16:56:30 -0400
Received: from warthog.ssd.intel.com by SSD.intel.com (4.1/SMI-4.1)
id AA07184; Thu, 14 Oct 93 13:56:28 PDT
Subject: STREAM benchmark
Date: Thu, 14 Oct 93 13:56:27 -0700
I saw the table you posted a few weeks ago in comp.arch, comparing
achieved memory bandwidths for several machines. Very interesting
numbers. I was wondering whether you've got any data on the new IBM
POWER2 machines. Also, is there any chance I could get a copy of the
codes you used? I'd like to try running the test on a few other
machines. I realize that they are all just 3-line programs, but it
makes direct comparison easier if I know that I'm using the same
Intel Supercomputer Systems Division
This archive was generated by hypermail 2b29 : Tue Apr 18 2000 - 05:23:03 CDT