"I am a person who works hard and plays hard."

Yuan Wei
Second Year Graduate Student Department of Computer Science
University of Virginia Charlottesville, VA 22903
Email: yw3f@cs.virginia.edu


Source Code Analysis

Main Page   Compound List   File List   Compound Members   File Members  

sim-outorder.c

Go to the documentation of this file.
00001 /*
00002  * sim-outorder.c - sample out-of-order issue perf simulator implementation
00003  *
00004  * This file is a part of the SimpleScalar tool suite written by
00005  * Todd M. Austin as a part of the Multiscalar Research Project.
00006  *  
00007  * The tool suite is currently maintained by Doug Burger and Todd M. Austin.
00008  * 
00009  * Copyright (C) 1994, 1995, 1996, 1997, 1998 by Todd M. Austin
00010  *
00011  * This source file is distributed "as is" in the hope that it will be
00012  * useful.  The tool set comes with no warranty, and no author or
00013  * distributor accepts any responsibility for the consequences of its
00014  * use. 
00015  * 
00016  * Everyone is granted permission to copy, modify and redistribute
00017  * this tool set under the following conditions:
00018  * 
00019  *    This source code is distributed for non-commercial use only. 
00020  *    Please contact the maintainer for restrictions applying to 
00021  *    commercial use.
00022  *
00023  *    Permission is granted to anyone to make or distribute copies
00024  *    of this source code, either as received or modified, in any
00025  *    medium, provided that all copyright notices, permission and
00026  *    nonwarranty notices are preserved, and that the distributor
00027  *    grants the recipient permission for further redistribution as
00028  *    permitted by this document.
00029  *
00030  *    Permission is granted to distribute this file in compiled
00031  *    or executable form under the same conditions that apply for
00032  *    source code, provided that either:
00033  *
00034  *    A. it is accompanied by the corresponding machine-readable
00035  *       source code,
00036  *    B. it is accompanied by a written offer, with no time limit,
00037  *       to give anyone a machine-readable copy of the corresponding
00038  *       source code in return for reimbursement of the cost of
00039  *       distribution.  This written offer must permit verbatim
00040  *       duplication by anyone, or
00041  *    C. it is distributed by someone who received only the
00042  *       executable form, and is accompanied by a copy of the
00043  *       written offer of source code that they received concurrently.
00044  *
00045  * In other words, you are welcome to use, share and improve this
00046  * source file.  You are forbidden to forbid anyone else to use, share
00047  * and improve what you give them.
00048  *
00049  * INTERNET: dburger@cs.wisc.edu
00050  * US Mail:  1210 W. Dayton Street, Madison, WI 53706
00051  *
00052  * $Id: sim-outorder.c,v 1.1.1.1 2000/05/26 15:18:58 taustin Exp $
00053  *
00054  * $Log: sim-outorder.c,v $
00055  * Revision 1.1.1.1  2000/05/26 15:18:58  taustin
00056  * SimpleScalar Tool Set
00057  *
00058  *
00059  * Revision 1.7  1999/12/31 18:50:38  taustin
00060  * quad_t naming conflicts removed
00061  * added retirement tracing to sim-outorder (enable with -v)
00062  * speculative execution should now be deterministic (uninit bugs fixed...)
00063  * sim-outorder now stops after sim_num_insn
00064  *
00065  * Revision 1.6  1999/12/13 18:46:40  taustin
00066  * cross endian execution support added
00067  *
00068  * Revision 1.5  1998/08/27 16:27:48  taustin
00069  * implemented host interface description in host.h
00070  * added target interface support
00071  * added support for register and memory contexts
00072  * instruction predecoding moved to loader module
00073  * Alpha target support added
00074  * added support for qword's
00075  * added fault support
00076  * added option ("-max:inst") to limit number of instructions analyzed
00077  * explicit BTB sizing option added to branch predictors, use
00078  *       "-btb" option to configure BTB
00079  * added queue statistics for IFQ, RUU, and LSQ; all terms of Little's
00080  *       law are measured and reports; also, measures fraction of cycles
00081  *       in which queue is full
00082  * added fast forward option ("-fastfwd") that skips a specified number
00083  *       of instructions (using functional simulation) before starting timing
00084  *       simulation
00085  * sim-outorder speculative loads no longer allocate memory pages,
00086  *       this significantly reduces memory requirements for programs with
00087  *       lots of mispeculation (e.g., cc1)
00088  * branch predictor updates can now optionally occur in ID, WB,
00089  *       or CT
00090  * added target-dependent myprintf() support
00091  * fixed speculative qword store bug (missing most significant word)
00092  * sim-outorder now computes correct result when non-speculative register
00093  *       operand is first defined speculative within the same inst
00094  * speculative fault handling simplified
00095  * dead variable "no_ea_dep" removed
00096  *
00097  * Revision 1.4  1997/04/16  22:10:23  taustin
00098  * added -commit:width support (from kskadron)
00099  * fixed "bad l2 D-cache parms" fatal string
00100  *
00101  * Revision 1.3  1997/03/11  17:17:06  taustin
00102  * updated copyright
00103  * `-pcstat' option support added
00104  * long/int tweaks made for ALPHA target support
00105  * better defaults defined for caches/TLBs
00106  * "mstate" command supported added for DLite!
00107  * supported added for non-GNU C compilers
00108  * buglet fixed in speculative trace generation
00109  * multi-level cache hierarchy now supported
00110  * two-level predictor supported added
00111  * I/D-TLB supported added
00112  * many comments added
00113  * options package supported added
00114  * stats package support added
00115  * resource configuration options extended
00116  * pipetrace support added
00117  * DLite! support added
00118  * writeback throttling now supported
00119  * decode and issue B/W now decoupled
00120  * new and improved (and more precise) memory scheduler added
00121  * cruft for TLB paper removed
00122  *
00123  * Revision 1.1  1996/12/05  18:52:32  taustin
00124  * Initial revision
00125  *
00126  *
00127  */
00128 
00129 #include <stdio.h>
00130 #include <stdlib.h>
00131 #include <math.h>
00132 #include <assert.h>
00133 #include <signal.h>
00134 
00135 #include "host.h"
00136 #include "misc.h"
00137 #include "machine.h"
00138 #include "regs.h"
00139 #include "memory.h"
00140 #include "cache.h"
00141 #include "loader.h"
00142 #include "syscall.h"
00143 #include "bpred.h"
00144 #include "resource.h"
00145 #include "bitmap.h"
00146 #include "options.h"
00147 #include "eval.h"
00148 #include "stats.h"
00149 #include "ptrace.h"
00150 #include "dlite.h"
00151 #include "sim.h"
00152 
00153 /*
00154  * This file implements a very detailed out-of-order issue superscalar
00155  * processor with a two-level memory system and speculative execution support.
00156  * This simulator is a performance simulator, tracking the latency of all
00157  * pipeline operations.
00158  */
00159 
00160 /* simulated registers */
00161 static struct regs_t regs;
00162 
00163 /* simulated memory */
00164 static struct mem_t *mem = NULL;
00165 
00166 
00167 /*
00168  * simulator options
00169  */
00170 
00171 /* maximum number of inst's to execute */
00172 static unsigned int max_insts;
00173 
00174 /* number of insts skipped before timing starts */
00175 static int fastfwd_count;
00176 
00177 /* pipeline trace range and output filename */
00178 static int ptrace_nelt = 0;
00179 static char *ptrace_opts[2];
00180 
00181 /* instruction fetch queue size (in insts) */
00182 static int ruu_ifq_size;
00183 
00184 /* extra branch mis-prediction latency */
00185 static int ruu_branch_penalty;
00186 
00187 /* speed of front-end of machine relative to execution core */
00188 static int fetch_speed;
00189 
00190 /* branch predictor type {nottaken|taken|perfect|bimod|2lev} */
00191 static char *pred_type;
00192 
00193 /* bimodal predictor config (<table_size>) */
00194 static int bimod_nelt = 1;
00195 static int bimod_config[1] =
00196   { /* bimod tbl size */2048 };
00197 
00198 /* 2-level predictor config (<l1size> <l2size> <hist_size> <xor>) */
00199 static int twolev_nelt = 4;
00200 static int twolev_config[4] =
00201   { /* l1size */1, /* l2size */1024, /* hist */8, /* xor */FALSE};
00202 
00203 /* combining predictor config (<meta_table_size> */
00204 static int comb_nelt = 1;
00205 static int comb_config[1] =
00206   { /* meta_table_size */1024 };
00207 
00208 /* return address stack (RAS) size */
00209 static int ras_size = 8;
00210 
00211 /* BTB predictor config (<num_sets> <associativity>) */
00212 static int btb_nelt = 2;
00213 static int btb_config[2] =
00214   { /* nsets */512, /* assoc */4 };
00215 
00216 /* instruction decode B/W (insts/cycle) */
00217 static int ruu_decode_width;
00218 
00219 /* instruction issue B/W (insts/cycle) */
00220 static int ruu_issue_width;
00221 
00222 /* run pipeline with in-order issue */
00223 static int ruu_inorder_issue;
00224 
00225 /* issue instructions down wrong execution paths */
00226 static int ruu_include_spec = TRUE;
00227 
00228 /* instruction commit B/W (insts/cycle) */
00229 static int ruu_commit_width;
00230 
00231 /* register update unit (RUU) size */
00232 static int RUU_size = 8;
00233 
00234 /* load/store queue (LSQ) size */
00235 static int LSQ_size = 4;
00236 
00237 /* l1 data cache config, i.e., {<config>|none} */
00238 static char *cache_dl1_opt;
00239 
00240 /* l1 data cache hit latency (in cycles) */
00241 static int cache_dl1_lat;
00242 
00243 /* l2 data cache config, i.e., {<config>|none} */
00244 static char *cache_dl2_opt;
00245 
00246 /* l2 data cache hit latency (in cycles) */
00247 static int cache_dl2_lat;
00248 
00249 /* l1 instruction cache config, i.e., {<config>|dl1|dl2|none} */
00250 static char *cache_il1_opt;
00251 
00252 /* l1 instruction cache hit latency (in cycles) */
00253 static int cache_il1_lat;
00254 
00255 /* l2 instruction cache config, i.e., {<config>|dl1|dl2|none} */
00256 static char *cache_il2_opt;
00257 
00258 /* l2 instruction cache hit latency (in cycles) */
00259 static int cache_il2_lat;
00260 
00261 /* flush caches on system calls */
00262 static int flush_on_syscalls;
00263 
00264 /* convert 64-bit inst addresses to 32-bit inst equivalents */
00265 static int compress_icache_addrs;
00266 
00267 /* memory access latency (<first_chunk> <inter_chunk>) */
00268 static int mem_nelt = 2;
00269 static int mem_lat[2] =
00270   { /* lat to first chunk */18, /* lat between remaining chunks */2 };
00271 
00272 /* memory access bus width (in bytes) */
00273 static int mem_bus_width;
00274 
00275 /* instruction TLB config, i.e., {<config>|none} */
00276 static char *itlb_opt;
00277 
00278 /* data TLB config, i.e., {<config>|none} */
00279 static char *dtlb_opt;
00280 
00281 /* inst/data TLB miss latency (in cycles) */
00282 static int tlb_miss_lat;
00283 
00284 /* total number of integer ALU's available */
00285 static int res_ialu;
00286 
00287 /* total number of integer multiplier/dividers available */
00288 static int res_imult;
00289 
00290 /* total number of memory system ports available (to CPU) */
00291 static int res_memport;
00292 
00293 /* total number of floating point ALU's available */
00294 static int res_fpalu;
00295 
00296 /* total number of floating point multiplier/dividers available */
00297 static int res_fpmult;
00298 
00299 /* text-based stat profiles */
00300 #define MAX_PCSTAT_VARS 8
00301 static int pcstat_nelt = 0;
00302 static char *pcstat_vars[MAX_PCSTAT_VARS];
00303 
00304 /* convert 64-bit inst text addresses to 32-bit inst equivalents */
00305 #ifdef TARGET_PISA
00306 #define IACOMPRESS(A)                                                   \
00307   (compress_icache_addrs ? ((((A) - ld_text_base) >> 1) + ld_text_base) : (A))
00308 #define ISCOMPRESS(SZ)                                                  \
00309   (compress_icache_addrs ? ((SZ) >> 1) : (SZ))
00310 #else /* !TARGET_PISA */
00311 #define IACOMPRESS(A)           (A)
00312 #define ISCOMPRESS(SZ)          (SZ)
00313 #endif /* TARGET_PISA */
00314 
00315 /* operate in backward-compatible bugs mode (for testing only) */
00316 static int bugcompat_mode;
00317 
00318 /*
00319  * functional unit resource configuration
00320  */
00321 
00322 /* resource pool indices, NOTE: update these if you change FU_CONFIG */
00323 #define FU_IALU_INDEX                   0
00324 #define FU_IMULT_INDEX                  1
00325 #define FU_MEMPORT_INDEX                2
00326 #define FU_FPALU_INDEX                  3
00327 #define FU_FPMULT_INDEX                 4
00328 
00329 /* resource pool definition, NOTE: update FU_*_INDEX defs if you change this */
00330 struct res_desc fu_config[] = {
00331   {
00332     "integer-ALU",
00333     4,
00334     0,
00335     {
00336       { IntALU, 1, 1 }
00337     }
00338   },
00339   {
00340     "integer-MULT/DIV",
00341     1,
00342     0,
00343     {
00344       { IntMULT, 3, 1 },
00345       { IntDIV, 20, 19 }
00346     }
00347   },
00348   {
00349     "memory-port",
00350     2,
00351     0,
00352     {
00353       { RdPort, 1, 1 },
00354       { WrPort, 1, 1 }
00355     }
00356   },
00357   {
00358     "FP-adder",
00359     4,
00360     0,
00361     {
00362       { FloatADD, 2, 1 },
00363       { FloatCMP, 2, 1 },
00364       { FloatCVT, 2, 1 }
00365     }
00366   },
00367   {
00368     "FP-MULT/DIV",
00369     1,
00370     0,
00371     {
00372       { FloatMULT, 4, 1 },
00373       { FloatDIV, 12, 12 },
00374       { FloatSQRT, 24, 24 }
00375     }
00376   },
00377 };
00378 
00379 
00380 /*
00381  * simulator stats
00382  */
00383 /* SLIP variable */
00384 static counter_t sim_slip = 0;
00385 
00386 /* total number of instructions executed */
00387 static counter_t sim_total_insn = 0;
00388 
00389 /* total number of memory references committed */
00390 static counter_t sim_num_refs = 0;
00391 
00392 /* total number of memory references executed */
00393 static counter_t sim_total_refs = 0;
00394 
00395 /* total number of loads committed */
00396 static counter_t sim_num_loads = 0;
00397 
00398 /* total number of loads executed */
00399 static counter_t sim_total_loads = 0;
00400 
00401 /* total number of branches committed */
00402 static counter_t sim_num_branches = 0;
00403 
00404 /* total number of branches executed */
00405 static counter_t sim_total_branches = 0;
00406 
00407 /* cycle counter */
00408 static tick_t sim_cycle = 0;
00409 
00410 /* occupancy counters */
00411 static counter_t IFQ_count;             /* cumulative IFQ occupancy */
00412 static counter_t IFQ_fcount;            /* cumulative IFQ full count */
00413 static counter_t RUU_count;             /* cumulative RUU occupancy */
00414 static counter_t RUU_fcount;            /* cumulative RUU full count */
00415 static counter_t LSQ_count;             /* cumulative LSQ occupancy */
00416 static counter_t LSQ_fcount;            /* cumulative LSQ full count */
00417 
00418 /* total non-speculative bogus addresses seen (debug var) */
00419 static counter_t sim_invalid_addrs;
00420 
00421 /*
00422  * simulator state variables
00423  */
00424 
00425 /* instruction sequence counter, used to assign unique id's to insts */
00426 static unsigned int inst_seq = 0;
00427 
00428 /* pipetrace instruction sequence counter */
00429 static unsigned int ptrace_seq = 0;
00430 
00431 /* speculation mode, non-zero when mis-speculating, i.e., executing
00432    instructions down the wrong path, thus state recovery will eventually have
00433    to occur that resets processor register and memory state back to the last
00434    precise state */
00435 static int spec_mode = FALSE;
00436 
00437 /* cycles until fetch issue resumes */
00438 static unsigned ruu_fetch_issue_delay = 0;
00439 
00440 /* perfect prediction enabled */
00441 static int pred_perfect = FALSE;
00442 
00443 /* speculative bpred-update enabled */
00444 static char *bpred_spec_opt;
00445 static enum { spec_ID, spec_WB, spec_CT } bpred_spec_update;
00446 
00447 /* level 1 instruction cache, entry level instruction cache */
00448 static struct cache_t *cache_il1;
00449 
00450 /* level 1 instruction cache */
00451 static struct cache_t *cache_il2;
00452 
00453 /* level 1 data cache, entry level data cache */
00454 static struct cache_t *cache_dl1;
00455 
00456 /* level 2 data cache */
00457 static struct cache_t *cache_dl2;
00458 
00459 /* instruction TLB */
00460 static struct cache_t *itlb;
00461 
00462 /* data TLB */
00463 static struct cache_t *dtlb;
00464 
00465 /* branch predictor */
00466 static struct bpred_t *pred;
00467 
00468 /* functional unit resource pool */
00469 static struct res_pool *fu_pool = NULL;
00470 
00471 /* text-based stat profiles */
00472 static struct stat_stat_t *pcstat_stats[MAX_PCSTAT_VARS];
00473 static counter_t pcstat_lastvals[MAX_PCSTAT_VARS];
00474 static struct stat_stat_t *pcstat_sdists[MAX_PCSTAT_VARS];
00475 
00476 /* wedge all stat values into a counter_t */
00477 #define STATVAL(STAT)                                                   \
00478   ((STAT)->sc == sc_int                                                 \
00479    ? (counter_t)*((STAT)->variant.for_int.var)                  \
00480    : ((STAT)->sc == sc_uint                                             \
00481       ? (counter_t)*((STAT)->variant.for_uint.var)              \
00482       : ((STAT)->sc == sc_counter                                       \
00483          ? *((STAT)->variant.for_counter.var)                           \
00484          : (panic("bad stat class"), 0))))
00485 
00486 
00487 /* memory access latency, assumed to not cross a page boundary */
00488 static unsigned int                     /* total latency of access */
00489 mem_access_latency(int blk_sz)          /* block size accessed */
00490 {
00491   int chunks = (blk_sz + (mem_bus_width - 1)) / mem_bus_width;
00492 
00493   assert(chunks > 0);
00494 
00495   return (/* first chunk latency */mem_lat[0] +
00496           (/* remainder chunk latency */mem_lat[1] * (chunks - 1)));
00497 }
00498 
00499 
00500 /*
00501  * cache miss handlers
00502  */
00503 
00504 /* l1 data cache l1 block miss handler function */
00505 static unsigned int                     /* latency of block access */
00506 dl1_access_fn(enum mem_cmd cmd,         /* access cmd, Read or Write */
00507               md_addr_t baddr,          /* block address to access */
00508               int bsize,                /* size of block to access */
00509               struct cache_blk_t *blk,  /* ptr to block in upper level */
00510               tick_t now)               /* time of access */
00511 {
00512   unsigned int lat;
00513 
00514   if (cache_dl2)
00515     {
00516       /* access next level of data cache hierarchy */
00517       lat = cache_access(cache_dl2, cmd, baddr, NULL, bsize,
00518                          /* now */now, /* pudata */NULL, /* repl addr */NULL);
00519       if (cmd == Read)
00520         return lat;
00521       else
00522         {
00523           /* FIXME: unlimited write buffers */
00524           return 0;
00525         }
00526     }
00527   else
00528     {
00529       /* access main memory */
00530       if (cmd == Read)
00531         return mem_access_latency(bsize);
00532       else
00533         {
00534           /* FIXME: unlimited write buffers */
00535           return 0;
00536         }
00537     }
00538 }
00539 
00540 /* l2 data cache block miss handler function */
00541 static unsigned int                     /* latency of block access */
00542 dl2_access_fn(enum mem_cmd cmd,         /* access cmd, Read or Write */
00543               md_addr_t baddr,          /* block address to access */
00544               int bsize,                /* size of block to access */
00545               struct cache_blk_t *blk,  /* ptr to block in upper level */
00546               tick_t now)               /* time of access */
00547 {
00548   /* this is a miss to the lowest level, so access main memory */
00549   if (cmd == Read)
00550     return mem_access_latency(bsize);
00551   else
00552     {
00553       /* FIXME: unlimited write buffers */
00554       return 0;
00555     }
00556 }
00557 
00558 /* l1 inst cache l1 block miss handler function */
00559 static unsigned int                     /* latency of block access */
00560 il1_access_fn(enum mem_cmd cmd,         /* access cmd, Read or Write */
00561               md_addr_t baddr,          /* block address to access */
00562               int bsize,                /* size of block to access */
00563               struct cache_blk_t *blk,  /* ptr to block in upper level */
00564               tick_t now)               /* time of access */
00565 {
00566   unsigned int lat;
00567 
00568 if (cache_il2)
00569     {
00570       /* access next level of inst cache hierarchy */
00571       lat = cache_access(cache_il2, cmd, baddr, NULL, bsize,
00572                          /* now */now, /* pudata */NULL, /* repl addr */NULL);
00573       if (cmd == Read)
00574         return lat;
00575       else
00576         panic("writes to instruction memory not supported");
00577     }
00578   else
00579     {
00580       /* access main memory */
00581       if (cmd == Read)
00582         return mem_access_latency(bsize);
00583       else
00584         panic("writes to instruction memory not supported");
00585     }
00586 }
00587 
00588 /* l2 inst cache block miss handler function */
00589 static unsigned int                     /* latency of block access */
00590 il2_access_fn(enum mem_cmd cmd,         /* access cmd, Read or Write */
00591               md_addr_t baddr,          /* block address to access */
00592               int bsize,                /* size of block to access */
00593               struct cache_blk_t *blk,  /* ptr to block in upper level */
00594               tick_t now)               /* time of access */
00595 {
00596   /* this is a miss to the lowest level, so access main memory */
00597   if (cmd == Read)
00598     return mem_access_latency(bsize);
00599   else
00600     panic("writes to instruction memory not supported");
00601 }
00602 
00603 
00604 /*
00605  * TLB miss handlers
00606  */
00607 
00608 /* inst cache block miss handler function */
00609 static unsigned int                     /* latency of block access */
00610 itlb_access_fn(enum mem_cmd cmd,        /* access cmd, Read or Write */
00611                md_addr_t baddr,         /* block address to access */
00612                int bsize,               /* size of block to access */
00613                struct cache_blk_t *blk, /* ptr to block in upper level */
00614                tick_t now)              /* time of access */
00615 {
00616   md_addr_t *phy_page_ptr = (md_addr_t *)blk->user_data;
00617 
00618   /* no real memory access, however, should have user data space attached */
00619   assert(phy_page_ptr);
00620 
00621   /* fake translation, for now... */
00622   *phy_page_ptr = 0;
00623 
00624   /* return tlb miss latency */
00625   return tlb_miss_lat;
00626 }
00627 
00628 /* data cache block miss handler function */
00629 static unsigned int                     /* latency of block access */
00630 dtlb_access_fn(enum mem_cmd cmd,        /* access cmd, Read or Write */
00631                md_addr_t baddr, /* block address to access */
00632                int bsize,               /* size of block to access */
00633                struct cache_blk_t *blk, /* ptr to block in upper level */
00634                tick_t now)              /* time of access */
00635 {
00636   md_addr_t *phy_page_ptr = (md_addr_t *)blk->user_data;
00637 
00638   /* no real memory access, however, should have user data space attached */
00639   assert(phy_page_ptr);
00640 
00641   /* fake translation, for now... */
00642   *phy_page_ptr = 0;
00643 
00644   /* return tlb miss latency */
00645   return tlb_miss_lat;
00646 }
00647 
00648 
00649 /* register simulator-specific options */
00650 void
00651 sim_reg_options(struct opt_odb_t *odb)
00652 {
00653   opt_reg_header(odb, 
00654 "sim-outorder: This simulator implements a very detailed out-of-order issue\n"
00655 "superscalar processor with a two-level memory system and speculative\n"
00656 "execution support.  This simulator is a performance simulator, tracking the\n"
00657 "latency of all pipeline operations.\n"
00658                  );
00659 
00660   /* instruction limit */
00661 
00662   opt_reg_uint(odb, "-max:inst", "maximum number of inst's to execute",
00663                &max_insts, /* default */0,
00664                /* print */TRUE, /* format */NULL);
00665 
00666   /* trace options */
00667 
00668   opt_reg_int(odb, "-fastfwd", "number of insts skipped before timing starts",
00669               &fastfwd_count, /* default */0,
00670               /* print */TRUE, /* format */NULL);
00671   opt_reg_string_list(odb, "-ptrace",
00672               "generate pipetrace, i.e., <fname|stdout|stderr> <range>",
00673               ptrace_opts, /* arr_sz */2, &ptrace_nelt, /* default */NULL,
00674               /* !print */FALSE, /* format */NULL, /* !accrue */FALSE);
00675 
00676   opt_reg_note(odb,
00677 "  Pipetrace range arguments are formatted as follows:\n"
00678 "\n"
00679 "    {{@|#}<start>}:{{@|#|+}<end>}\n"
00680 "\n"
00681 "  Both ends of the range are optional, if neither are specified, the entire\n"
00682 "  execution is traced.  Ranges that start with a `@' designate an address\n"
00683 "  range to be traced, those that start with an `#' designate a cycle count\n"
00684 "  range.  All other range values represent an instruction count range.  The\n"
00685 "  second argument, if specified with a `+', indicates a value relative\n"
00686 "  to the first argument, e.g., 1000:+100 == 1000:1100.  Program symbols may\n"
00687 "  be used in all contexts.\n"
00688 "\n"
00689 "    Examples:   -ptrace FOO.trc #0:#1000\n"
00690 "                -ptrace BAR.trc @2000:\n"
00691 "                -ptrace BLAH.trc :1500\n"
00692 "                -ptrace UXXE.trc :\n"
00693 "                -ptrace FOOBAR.trc @main:+278\n"
00694                );
00695 
00696   /* ifetch options */
00697 
00698   opt_reg_int(odb, "-fetch:ifqsize", "instruction fetch queue size (in insts)",
00699               &ruu_ifq_size, /* default */4,
00700               /* print */TRUE, /* format */NULL);
00701 
00702   opt_reg_int(odb, "-fetch:mplat", "extra branch mis-prediction latency",
00703               &ruu_branch_penalty, /* default */3,
00704               /* print */TRUE, /* format */NULL);
00705 
00706   opt_reg_int(odb, "-fetch:speed",
00707               "speed of front-end of machine relative to execution core",
00708               &fetch_speed, /* default */1,
00709               /* print */TRUE, /* format */NULL);
00710 
00711   /* branch predictor options */
00712 
00713   opt_reg_note(odb,
00714 "  Branch predictor configuration examples for 2-level predictor:\n"
00715 "    Configurations:   N, M, W, X\n"
00716 "      N   # entries in first level (# of shift register(s))\n"
00717 "      W   width of shift register(s)\n"
00718 "      M   # entries in 2nd level (# of counters, or other FSM)\n"
00719 "      X   (yes-1/no-0) xor history and address for 2nd level index\n"
00720 "    Sample predictors:\n"
00721 "      GAg     : 1, W, 2^W, 0\n"
00722 "      GAp     : 1, W, M (M > 2^W), 0\n"
00723 "      PAg     : N, W, 2^W, 0\n"
00724 "      PAp     : N, W, M (M == 2^(N+W)), 0\n"
00725 "      gshare  : 1, W, 2^W, 1\n"
00726 "  Predictor `comb' combines a bimodal and a 2-level predictor.\n"
00727                );
00728 
00729   opt_reg_string(odb, "-bpred",
00730                  "branch predictor type {nottaken|taken|perfect|bimod|2lev|comb}",
00731                  &pred_type, /* default */"bimod",
00732                  /* print */TRUE, /* format */NULL);
00733 
00734   opt_reg_int_list(odb, "-bpred:bimod",
00735                    "bimodal predictor config (<table size>)",
00736                    bimod_config, bimod_nelt, &bimod_nelt,
00737                    /* default */bimod_config,
00738                    /* print */TRUE, /* format */NULL, /* !accrue */FALSE);
00739 
00740   opt_reg_int_list(odb, "-bpred:2lev",
00741                    "2-level predictor config "
00742                    "(<l1size> <l2size> <hist_size> <xor>)",
00743                    twolev_config, twolev_nelt, &twolev_nelt,
00744                    /* default */twolev_config,
00745                    /* print */TRUE, /* format */NULL, /* !accrue */FALSE);
00746 
00747   opt_reg_int_list(odb, "-bpred:comb",
00748                    "combining predictor config (<meta_table_size>)",
00749                    comb_config, comb_nelt, &comb_nelt,
00750                    /* default */comb_config,
00751                    /* print */TRUE, /* format */NULL, /* !accrue */FALSE);
00752 
00753   opt_reg_int(odb, "-bpred:ras",
00754               "return address stack size (0 for no return stack)",
00755               &ras_size, /* default */ras_size,
00756               /* print */TRUE, /* format */NULL);
00757 
00758   opt_reg_int_list(odb, "-bpred:btb",
00759                    "BTB config (<num_sets> <associativity>)",
00760                    btb_config, btb_nelt, &btb_nelt,
00761                    /* default */btb_config,
00762                    /* print */TRUE, /* format */NULL, /* !accrue */FALSE);
00763 
00764   opt_reg_string(odb, "-bpred:spec_update",
00765                  "speculative predictors update in {ID|WB} (default non-spec)",
00766                  &bpred_spec_opt, /* default */NULL,
00767                  /* print */TRUE, /* format */NULL);
00768 
00769   /* decode options */
00770 
00771   opt_reg_int(odb, "-decode:width",
00772               "instruction decode B/W (insts/cycle)",
00773               &ruu_decode_width, /* default */4,
00774               /* print */TRUE, /* format */NULL);
00775 
00776   /* issue options */
00777 
00778   opt_reg_int(odb, "-issue:width",
00779               "instruction issue B/W (insts/cycle)",
00780               &ruu_issue_width, /* default */4,
00781               /* print */TRUE, /* format */NULL);
00782 
00783   opt_reg_flag(odb, "-issue:inorder", "run pipeline with in-order issue",
00784                &ruu_inorder_issue, /* default */FALSE,
00785                /* print */TRUE, /* format */NULL);
00786 
00787   opt_reg_flag(odb, "-issue:wrongpath",
00788                "issue instructions down wrong execution paths",
00789                &ruu_include_spec, /* default */TRUE,
00790                /* print */TRUE, /* format */NULL);
00791 
00792   /* commit options */
00793 
00794   opt_reg_int(odb, "-commit:width",
00795               "instruction commit B/W (insts/cycle)",
00796               &ruu_commit_width, /* default */4,
00797               /* print */TRUE, /* format */NULL);
00798 
00799   /* register scheduler options */
00800 
00801   opt_reg_int(odb, "-ruu:size",
00802               "register update unit (RUU) size",
00803               &RUU_size, /* default */16,
00804               /* print */TRUE, /* format */NULL);
00805 
00806   /* memory scheduler options  */
00807 
00808   opt_reg_int(odb, "-lsq:size",
00809               "load/store queue (LSQ) size",
00810               &LSQ_size, /* default */8,
00811               /* print */TRUE, /* format */NULL);
00812 
00813   /* cache options */
00814 
00815   opt_reg_string(odb, "-cache:dl1",
00816                  "l1 data cache config, i.e., {<config>|none}",
00817                  &cache_dl1_opt, "dl1:128:32:4:l",
00818                  /* print */TRUE, NULL);
00819 
00820   opt_reg_note(odb,
00821 "  The cache config parameter <config> has the following format:\n"
00822 "\n"
00823 "    <name>:<nsets>:<bsize>:<assoc>:<repl>\n"
00824 "\n"
00825 "    <name>   - name of the cache being defined\n"
00826 "    <nsets>  - number of sets in the cache\n"
00827 "    <bsize>  - block size of the cache\n"
00828 "    <assoc>  - associativity of the cache\n"
00829 "    <repl>   - block replacement strategy, 'l'-LRU, 'f'-FIFO, 'r'-random\n"
00830 "\n"
00831 "    Examples:   -cache:dl1 dl1:4096:32:1:l\n"
00832 "                -dtlb dtlb:128:4096:32:r\n"
00833                );
00834 
00835   opt_reg_int(odb, "-cache:dl1lat",
00836               "l1 data cache hit latency (in cycles)",
00837               &cache_dl1_lat, /* default */1,
00838               /* print */TRUE, /* format */NULL);
00839 
00840   opt_reg_string(odb, "-cache:dl2",
00841                  "l2 data cache config, i.e., {<config>|none}",
00842                  &cache_dl2_opt, "ul2:1024:64:4:l",
00843                  /* print */TRUE, NULL);
00844 
00845   opt_reg_int(odb, "-cache:dl2lat",
00846               "l2 data cache hit latency (in cycles)",
00847               &cache_dl2_lat, /* default */6,
00848               /* print */TRUE, /* format */NULL);
00849 
00850   opt_reg_string(odb, "-cache:il1",
00851                  "l1 inst cache config, i.e., {<config>|dl1|dl2|none}",
00852                  &cache_il1_opt, "il1:512:32:1:l",
00853                  /* print */TRUE, NULL);
00854 
00855   opt_reg_note(odb,
00856 "  Cache levels can be unified by pointing a level of the instruction cache\n"
00857 "  hierarchy at the data cache hiearchy using the \"dl1\" and \"dl2\" cache\n"
00858 "  configuration arguments.  Most sensible combinations are supported, e.g.,\n"
00859 "\n"
00860 "    A unified l2 cache (il2 is pointed at dl2):\n"
00861 "      -cache:il1 il1:128:64:1:l -cache:il2 dl2\n"
00862 "      -cache:dl1 dl1:256:32:1:l -cache:dl2 ul2:1024:64:2:l\n"
00863 "\n"
00864 "    Or, a fully unified cache hierarchy (il1 pointed at dl1):\n"
00865 "      -cache:il1 dl1\n"
00866 "      -cache:dl1 ul1:256:32:1:l -cache:dl2 ul2:1024:64:2:l\n"
00867                );
00868 
00869   opt_reg_int(odb, "-cache:il1lat",
00870               "l1 instruction cache hit latency (in cycles)",
00871               &cache_il1_lat, /* default */1,
00872               /* print */TRUE, /* format */NULL);
00873 
00874   opt_reg_string(odb, "-cache:il2",
00875                  "l2 instruction cache config, i.e., {<config>|dl2|none}",
00876                  &cache_il2_opt, "dl2",
00877                  /* print */TRUE, NULL);
00878 
00879   opt_reg_int(odb, "-cache:il2lat",
00880               "l2 instruction cache hit latency (in cycles)",
00881               &cache_il2_lat, /* default */6,
00882               /* print */TRUE, /* format */NULL);
00883 
00884   opt_reg_flag(odb, "-cache:flush", "flush caches on system calls",
00885                &flush_on_syscalls, /* default */FALSE, /* print */TRUE, NULL);
00886 
00887   opt_reg_flag(odb, "-cache:icompress",
00888                "convert 64-bit inst addresses to 32-bit inst equivalents",
00889                &compress_icache_addrs, /* default */FALSE,
00890                /* print */TRUE, NULL);
00891 
00892   /* mem options */
00893   opt_reg_int_list(odb, "-mem:lat",
00894                    "memory access latency (<first_chunk> <inter_chunk>)",
00895                    mem_lat, mem_nelt, &mem_nelt, mem_lat,
00896                    /* print */TRUE, /* format */NULL, /* !accrue */FALSE);
00897 
00898   opt_reg_int(odb, "-mem:width", "memory access bus width (in bytes)",
00899               &mem_bus_width, /* default */8,
00900               /* print */TRUE, /* format */NULL);
00901 
00902   /* TLB options */
00903 
00904   opt_reg_string(odb, "-tlb:itlb",
00905                  "instruction TLB config, i.e., {<config>|none}",
00906                  &itlb_opt, "itlb:16:4096:4:l", /* print */TRUE, NULL);
00907 
00908   opt_reg_string(odb, "-tlb:dtlb",
00909                  "data TLB config, i.e., {<config>|none}",
00910                  &dtlb_opt, "dtlb:32:4096:4:l", /* print */TRUE, NULL);
00911 
00912   opt_reg_int(odb, "-tlb:lat",
00913               "inst/data TLB miss latency (in cycles)",
00914               &tlb_miss_lat, /* default */30,
00915               /* print */TRUE, /* format */NULL);
00916 
00917   /* resource configuration */
00918 
00919   opt_reg_int(odb, "-res:ialu",
00920               "total number of integer ALU's available",
00921               &res_ialu, /* default */fu_config[FU_IALU_INDEX].quantity,
00922               /* print */TRUE, /* format */NULL);
00923 
00924   opt_reg_int(odb, "-res:imult",
00925               "total number of integer multiplier/dividers available",
00926               &res_imult, /* default */fu_config[FU_IMULT_INDEX].quantity,
00927               /* print */TRUE, /* format */NULL);
00928 
00929   opt_reg_int(odb, "-res:memport",
00930               "total number of memory system ports available (to CPU)",
00931               &res_memport, /* default */fu_config[FU_MEMPORT_INDEX].quantity,
00932               /* print */TRUE, /* format */NULL);
00933 
00934   opt_reg_int(odb, "-res:fpalu",
00935               "total number of floating point ALU's available",
00936               &res_fpalu, /* default */fu_config[FU_FPALU_INDEX].quantity,
00937               /* print */TRUE, /* format */NULL);
00938 
00939   opt_reg_int(odb, "-res:fpmult",
00940               "total number of floating point multiplier/dividers available",
00941               &res_fpmult, /* default */fu_config[FU_FPMULT_INDEX].quantity,
00942               /* print */TRUE, /* format */NULL);
00943 
00944   opt_reg_string_list(odb, "-pcstat",
00945                       "profile stat(s) against text addr's (mult uses ok)",
00946                       pcstat_vars, MAX_PCSTAT_VARS, &pcstat_nelt, NULL,
00947                       /* !print */FALSE, /* format */NULL, /* accrue */TRUE);
00948 
00949   opt_reg_flag(odb, "-bugcompat",
00950                "operate in backward-compatible bugs mode (for testing only)",
00951                &bugcompat_mode, /* default */FALSE, /* print */TRUE, NULL);
00952 }
00953 
00954 /* check simulator-specific option values */
00955 void
00956 sim_check_options(struct opt_odb_t *odb,        /* options database */
00957                   int argc, char **argv)        /* command line arguments */
00958 {
00959   char name[128], c;
00960   int nsets, bsize, assoc;
00961 
00962   if (fastfwd_count < 0 || fastfwd_count >= 2147483647)
00963     fatal("bad fast forward count: %d", fastfwd_count);
00964 
00965   if (ruu_ifq_size < 1 || (ruu_ifq_size & (ruu_ifq_size - 1)) != 0)
00966     fatal("inst fetch queue size must be positive > 0 and a power of two");
00967 
00968   if (ruu_branch_penalty < 1)
00969     fatal("mis-prediction penalty must be at least 1 cycle");
00970 
00971   if (fetch_speed < 1)
00972     fatal("front-end speed must be positive and non-zero");
00973 
00974   if (!mystricmp(pred_type, "perfect"))
00975     {
00976       /* perfect predictor */
00977       pred = NULL;
00978       pred_perfect = TRUE;
00979     }
00980   else if (!mystricmp(pred_type, "taken"))
00981     {
00982       /* static predictor, not taken */
00983       pred = bpred_create(BPredTaken, 0, 0, 0, 0, 0, 0, 0, 0, 0);
00984     }
00985   else if (!mystricmp(pred_type, "nottaken"))
00986     {
00987       /* static predictor, taken */
00988       pred = bpred_create(BPredNotTaken, 0, 0, 0, 0, 0, 0, 0, 0, 0);
00989     }
00990   else if (!mystricmp(pred_type, "bimod"))
00991     {
00992       /* bimodal predictor, bpred_create() checks BTB_SIZE */
00993       if (bimod_nelt != 1)
00994         fatal("bad bimod predictor config (<table_size>)");
00995       if (btb_nelt != 2)
00996         fatal("bad btb config (<num_sets> <associativity>)");
00997 
00998       /* bimodal predictor, bpred_create() checks BTB_SIZE */
00999       pred = bpred_create(BPred2bit,
01000                           /* bimod table size */bimod_config[0],
01001                           /* 2lev l1 size */0,
01002                           /* 2lev l2 size */0,
01003                           /* meta table size */0,
01004                           /* history reg size */0,
01005                           /* history xor address */0,
01006                           /* btb sets */btb_config[0],
01007                           /* btb assoc */btb_config[1],
01008                           /* ret-addr stack size */ras_size);
01009     }
01010   else if (!mystricmp(pred_type, "2lev"))
01011     {
01012       /* 2-level adaptive predictor, bpred_create() checks args */
01013       if (twolev_nelt != 4)
01014         fatal("bad 2-level pred config (<l1size> <l2size> <hist_size> <xor>)");
01015       if (btb_nelt != 2)
01016         fatal("bad btb config (<num_sets> <associativity>)");
01017 
01018       pred = bpred_create(BPred2Level,
01019                           /* bimod table size */0,
01020                           /* 2lev l1 size */twolev_config[0],
01021                           /* 2lev l2 size */twolev_config[1],
01022                           /* meta table size */0,
01023                           /* history reg size */twolev_config[2],
01024                           /* history xor address */twolev_config[3],
01025                           /* btb sets */btb_config[0],
01026                           /* btb assoc */btb_config[1],
01027                           /* ret-addr stack size */ras_size);
01028     }
01029   else if (!mystricmp(pred_type, "comb"))
01030     {
01031       /* combining predictor, bpred_create() checks args */
01032       if (twolev_nelt != 4)
01033         fatal("bad 2-level pred config (<l1size> <l2size> <hist_size> <xor>)");
01034       if (bimod_nelt != 1)
01035         fatal("bad bimod predictor config (<table_size>)");
01036       if (comb_nelt != 1)
01037         fatal("bad combining predictor config (<meta_table_size>)");
01038       if (btb_nelt != 2)
01039         fatal("bad btb config (<num_sets> <associativity>)");
01040 
01041       pred = bpred_create(BPredComb,
01042                           /* bimod table size */bimod_config[0],
01043                           /* l1 size */twolev_config[0],
01044                           /* l2 size */twolev_config[1],
01045                           /* meta table size */comb_config[0],
01046                           /* history reg size */twolev_config[2],
01047                           /* history xor address */twolev_config[3],
01048                           /* btb sets */btb_config[0],
01049                           /* btb assoc */btb_config[1],
01050                           /* ret-addr stack size */ras_size);
01051     }
01052   else
01053     fatal("cannot parse predictor type `%s'", pred_type);
01054 
01055   if (!bpred_spec_opt)
01056     bpred_spec_update = spec_CT;
01057   else if (!mystricmp(bpred_spec_opt, "ID"))
01058     bpred_spec_update = spec_ID;
01059   else if (!mystricmp(bpred_spec_opt, "WB"))
01060     bpred_spec_update = spec_WB;
01061   else
01062     fatal("bad speculative update stage specifier, use {ID|WB}");
01063 
01064   if (ruu_decode_width < 1 || (ruu_decode_width & (ruu_decode_width-1)) != 0)
01065     fatal("issue width must be positive non-zero and a power of two");
01066 
01067   if (ruu_issue_width < 1 || (ruu_issue_width & (ruu_issue_width-1)) != 0)
01068     fatal("issue width must be positive non-zero and a power of two");
01069 
01070   if (ruu_commit_width < 1)
01071     fatal("commit width must be positive non-zero");
01072 
01073   if (RUU_size < 2 || (RUU_size & (RUU_size-1)) != 0)
01074     fatal("RUU size must be a positive number > 1 and a power of two");
01075 
01076   if (LSQ_size < 2 || (LSQ_size & (LSQ_size-1)) != 0)
01077     fatal("LSQ size must be a positive number > 1 and a power of two");
01078 
01079   /* use a level 1 D-cache? */
01080   if (!mystricmp(cache_dl1_opt, "none"))
01081     {
01082       cache_dl1 = NULL;
01083 
01084       /* the level 2 D-cache cannot be defined */
01085       if (strcmp(cache_dl2_opt, "none"))
01086         fatal("the l1 data cache must defined if the l2 cache is defined");
01087       cache_dl2 = NULL;
01088     }
01089   else /* dl1 is defined */
01090     {
01091       if (sscanf(cache_dl1_opt, "%[^:]:%d:%d:%d:%c",
01092                  name, &nsets, &bsize, &assoc, &c) != 5)
01093         fatal("bad l1 D-cache parms: <name>:<nsets>:<bsize>:<assoc>:<repl>");
01094       cache_dl1 = cache_create(name, nsets, bsize, /* balloc */FALSE,
01095                                /* usize */0, assoc, cache_char2policy(c),
01096                                dl1_access_fn, /* hit lat */cache_dl1_lat);
01097 
01098       /* is the level 2 D-cache defined? */
01099       if (!mystricmp(cache_dl2_opt, "none"))
01100         cache_dl2 = NULL;
01101       else
01102         {
01103           if (sscanf(cache_dl2_opt, "%[^:]:%d:%d:%d:%c",
01104                      name, &nsets, &bsize, &assoc, &c) != 5)
01105             fatal("bad l2 D-cache parms: "
01106                   "<name>:<nsets>:<bsize>:<assoc>:<repl>");
01107           cache_dl2 = cache_create(name, nsets, bsize, /* balloc */FALSE,
01108                                    /* usize */0, assoc, cache_char2policy(c),
01109                                    dl2_access_fn, /* hit lat */cache_dl2_lat);
01110         }
01111     }
01112 
01113   /* use a level 1 I-cache? */
01114   if (!mystricmp(cache_il1_opt, "none"))
01115     {
01116       cache_il1 = NULL;
01117 
01118       /* the level 2 I-cache cannot be defined */
01119       if (strcmp(cache_il2_opt, "none"))
01120         fatal("the l1 inst cache must defined if the l2 cache is defined");
01121       cache_il2 = NULL;
01122     }
01123   else if (!mystricmp(cache_il1_opt, "dl1"))
01124     {
01125       if (!cache_dl1)
01126         fatal("I-cache l1 cannot access D-cache l1 as it's undefined");
01127       cache_il1 = cache_dl1;
01128 
01129       /* the level 2 I-cache cannot be defined */
01130       if (strcmp(cache_il2_opt, "none"))
01131         fatal("the l1 inst cache must defined if the l2 cache is defined");
01132       cache_il2 = NULL;
01133     }
01134   else if (!mystricmp(cache_il1_opt, "dl2"))
01135     {
01136       if (!cache_dl2)
01137         fatal("I-cache l1 cannot access D-cache l2 as it's undefined");
01138       cache_il1 = cache_dl2;
01139 
01140       /* the level 2 I-cache cannot be defined */
01141       if (strcmp(cache_il2_opt, "none"))
01142         fatal("the l1 inst cache must defined if the l2 cache is defined");
01143       cache_il2 = NULL;
01144     }
01145   else /* il1 is defined */
01146     {
01147       if (sscanf(cache_il1_opt, "%[^:]:%d:%d:%d:%c",
01148                  name, &nsets, &bsize, &assoc, &c) != 5)
01149         fatal("bad l1 I-cache parms: <name>:<nsets>:<bsize>:<assoc>:<repl>");
01150       cache_il1 = cache_create(name, nsets, bsize, /* balloc */FALSE,
01151                                /* usize */0, assoc, cache_char2policy(c),
01152                                il1_access_fn, /* hit lat */cache_il1_lat);
01153 
01154       /* is the level 2 D-cache defined? */
01155       if (!mystricmp(cache_il2_opt, "none"))
01156         cache_il2 = NULL;
01157       else if (!mystricmp(cache_il2_opt, "dl2"))
01158         {
01159           if (!cache_dl2)
01160             fatal("I-cache l2 cannot access D-cache l2 as it's undefined");
01161           cache_il2 = cache_dl2;
01162         }
01163       else
01164         {
01165           if (sscanf(cache_il2_opt, "%[^:]:%d:%d:%d:%c",
01166                      name, &nsets, &bsize, &assoc, &c) != 5)
01167             fatal("bad l2 I-cache parms: "
01168                   "<name>:<nsets>:<bsize>:<assoc>:<repl>");
01169           cache_il2 = cache_create(name, nsets, bsize, /* balloc */FALSE,
01170                                    /* usize */0, assoc, cache_char2policy(c),
01171                                    il2_access_fn, /* hit lat */cache_il2_lat);
01172         }
01173     }
01174 
01175   /* use an I-TLB? */
01176   if (!mystricmp(itlb_opt, "none"))
01177     itlb = NULL;
01178   else
01179     {
01180       if (sscanf(itlb_opt, "%[^:]:%d:%d:%d:%c",
01181                  name, &nsets, &bsize, &assoc, &c) != 5)
01182         fatal("bad TLB parms: <name>:<nsets>:<page_size>:<assoc>:<repl>");
01183       itlb = cache_create(name, nsets, bsize, /* balloc */FALSE,
01184                           /* usize */sizeof(md_addr_t), assoc,
01185                           cache_char2policy(c), itlb_access_fn,
01186                           /* hit latency */1);
01187     }
01188 
01189   /* use a D-TLB? */
01190   if (!mystricmp(dtlb_opt, "none"))
01191     dtlb = NULL;
01192   else
01193     {
01194       if (sscanf(dtlb_opt, "%[^:]:%d:%d:%d:%c",
01195                  name, &nsets, &bsize, &assoc, &c) != 5)
01196         fatal("bad TLB parms: <name>:<nsets>:<page_size>:<assoc>:<repl>");
01197       dtlb = cache_create(name, nsets, bsize, /* balloc */FALSE,
01198                           /* usize */sizeof(md_addr_t), assoc,
01199                           cache_char2policy(c), dtlb_access_fn,
01200                           /* hit latency */1);
01201     }
01202 
01203   if (cache_dl1_lat < 1)
01204     fatal("l1 data cache latency must be greater than zero");
01205 
01206   if (cache_dl2_lat < 1)
01207     fatal("l2 data cache latency must be greater than zero");
01208 
01209   if (cache_il1_lat < 1)
01210     fatal("l1 instruction cache latency must be greater than zero");
01211 
01212   if (cache_il2_lat < 1)
01213     fatal("l2 instruction cache latency must be greater than zero");
01214 
01215   if (mem_nelt != 2)
01216     fatal("bad memory access latency (<first_chunk> <inter_chunk>)");
01217 
01218   if (mem_lat[0] < 1 || mem_lat[1] < 1)
01219     fatal("all memory access latencies must be greater than zero");
01220 
01221   if (mem_bus_width < 1 || (mem_bus_width & (mem_bus_width-1)) != 0)
01222     fatal("memory bus width must be positive non-zero and a power of two");
01223 
01224   if (tlb_miss_lat < 1)
01225     fatal("TLB miss latency must be greater than zero");
01226 
01227   if (res_ialu < 1)
01228     fatal("number of integer ALU's must be greater than zero");
01229   if (res_ialu > MAX_INSTS_PER_CLASS)
01230     fatal("number of integer ALU's must be <= MAX_INSTS_PER_CLASS");
01231   fu_config[FU_IALU_INDEX].quantity = res_ialu;
01232   
01233   if (res_imult < 1)
01234     fatal("number of integer multiplier/dividers must be greater than zero");
01235   if (res_imult > MAX_INSTS_PER_CLASS)
01236     fatal("number of integer mult/div's must be <= MAX_INSTS_PER_CLASS");
01237   fu_config[FU_IMULT_INDEX].quantity = res_imult;
01238   
01239   if (res_memport < 1)
01240     fatal("number of memory system ports must be greater than zero");
01241   if (res_memport > MAX_INSTS_PER_CLASS)
01242     fatal("number of memory system ports must be <= MAX_INSTS_PER_CLASS");
01243   fu_config[FU_MEMPORT_INDEX].quantity = res_memport;
01244   
01245   if (res_fpalu < 1)
01246     fatal("number of floating point ALU's must be greater than zero");
01247   if (res_fpalu > MAX_INSTS_PER_CLASS)
01248     fatal("number of floating point ALU's must be <= MAX_INSTS_PER_CLASS");
01249   fu_config[FU_FPALU_INDEX].quantity = res_fpalu;
01250   
01251   if (res_fpmult < 1)
01252     fatal("number of floating point multiplier/dividers must be > zero");
01253   if (res_fpmult > MAX_INSTS_PER_CLASS)
01254     fatal("number of FP mult/div's must be <= MAX_INSTS_PER_CLASS");
01255   fu_config[FU_FPMULT_INDEX].quantity = res_fpmult;
01256 }
01257 
01258 /* print simulator-specific configuration information */
01259 void
01260 sim_aux_config(FILE *stream)            /* output stream */
01261 {
01262   /* nada */
01263 }
01264 
01265 /* register simulator-specific statistics */
01266 void
01267 sim_reg_stats(struct stat_sdb_t *sdb)   /* stats database */
01268 {
01269   int i;
01270   stat_reg_counter(sdb, "sim_num_insn",
01271                    "total number of instructions committed",
01272                    &sim_num_insn, sim_num_insn, NULL);
01273   stat_reg_counter(sdb, "sim_num_refs",
01274                    "total number of loads and stores committed",
01275                    &sim_num_refs, 0, NULL);
01276   stat_reg_counter(sdb, "sim_num_loads",
01277                    "total number of loads committed",
01278                    &sim_num_loads, 0, NULL);
01279   stat_reg_formula(sdb, "sim_num_stores",
01280                    "total number of stores committed",
01281                    "sim_num_refs - sim_num_loads", NULL);
01282   stat_reg_counter(sdb, "sim_num_branches",
01283                    "total number of branches committed",
01284                    &sim_num_branches, /* initial value */0, /* format */NULL);
01285   stat_reg_int(sdb, "sim_elapsed_time",
01286                "total simulation time in seconds",
01287                &sim_elapsed_time, 0, NULL);
01288   stat_reg_formula(sdb, "sim_inst_rate",
01289                    "simulation speed (in insts/sec)",
01290                    "sim_num_insn / sim_elapsed_time", NULL);
01291 
01292   stat_reg_counter(sdb, "sim_total_insn",
01293                    "total number of instructions executed",
01294                    &sim_total_insn, 0, NULL);
01295   stat_reg_counter(sdb, "sim_total_refs",
01296                    "total number of loads and stores executed",
01297                    &sim_total_refs, 0, NULL);
01298   stat_reg_counter(sdb, "sim_total_loads",
01299                    "total number of loads executed",
01300                    &sim_total_loads, 0, NULL);
01301   stat_reg_formula(sdb, "sim_total_stores",
01302                    "total number of stores executed",
01303                    "sim_total_refs - sim_total_loads", NULL);
01304   stat_reg_counter(sdb, "sim_total_branches",
01305                    "total number of branches executed",
01306                    &sim_total_branches, /* initial value */0, /* format */NULL);
01307 
01308   /* register performance stats */
01309   stat_reg_counter(sdb, "sim_cycle",
01310                    "total simulation time in cycles",
01311                    &sim_cycle, /* initial value */0, /* format */NULL);
01312   stat_reg_formula(sdb, "sim_IPC",
01313                    "instructions per cycle",
01314                    "sim_num_insn / sim_cycle", /* format */NULL);
01315   stat_reg_formula(sdb, "sim_CPI",
01316                    "cycles per instruction",
01317                    "sim_cycle / sim_num_insn", /* format */NULL);
01318   stat_reg_formula(sdb, "sim_exec_BW",
01319                    "total instructions (mis-spec + committed) per cycle",
01320                    "sim_total_insn / sim_cycle", /* format */NULL);
01321   stat_reg_formula(sdb, "sim_IPB",
01322                    "instruction per branch",
01323                    "sim_num_insn / sim_num_branches", /* format */NULL);
01324 
01325   /* occupancy stats */
01326   stat_reg_counter(sdb, "IFQ_count", "cumulative IFQ occupancy",
01327                    &IFQ_count, /* initial value */0, /* format */NULL);
01328   stat_reg_counter(sdb, "IFQ_fcount", "cumulative IFQ full count",
01329                    &IFQ_fcount, /* initial value */0, /* format */NULL);
01330   stat_reg_formula(sdb, "ifq_occupancy", "avg IFQ occupancy (insn's)",
01331                    "IFQ_count / sim_cycle", /* format */NULL);
01332   stat_reg_formula(sdb, "ifq_rate", "avg IFQ dispatch rate (insn/cycle)",
01333                    "sim_total_insn / sim_cycle", /* format */NULL);
01334   stat_reg_formula(sdb, "ifq_latency", "avg IFQ occupant latency (cycle's)",
01335                    "ifq_occupancy / ifq_rate", /* format */NULL);
01336   stat_reg_formula(sdb, "ifq_full", "fraction of time (cycle's) IFQ was full",
01337                    "IFQ_fcount / sim_cycle", /* format */NULL);
01338 
01339   stat_reg_counter(sdb, "RUU_count", "cumulative RUU occupancy",
01340                    &RUU_count, /* initial value */0, /* format */NULL);
01341   stat_reg_counter(sdb, "RUU_fcount", "cumulative RUU full count",
01342                    &RUU_fcount, /* initial value */0, /* format */NULL);
01343   stat_reg_formula(sdb, "ruu_occupancy", "avg RUU occupancy (insn's)",
01344                    "RUU_count / sim_cycle", /* format */NULL);
01345   stat_reg_formula(sdb, "ruu_rate", "avg RUU dispatch rate (insn/cycle)",
01346                    "sim_total_insn / sim_cycle", /* format */NULL);
01347   stat_reg_formula(sdb, "ruu_latency", "avg RUU occupant latency (cycle's)",
01348                    "ruu_occupancy / ruu_rate", /* format */NULL);
01349   stat_reg_formula(sdb, "ruu_full", "fraction of time (cycle's) RUU was full",
01350                    "RUU_fcount / sim_cycle", /* format */NULL);
01351 
01352   stat_reg_counter(sdb, "LSQ_count", "cumulative LSQ occupancy",
01353                    &LSQ_count, /* initial value */0, /* format */NULL);
01354   stat_reg_counter(sdb, "LSQ_fcount", "cumulative LSQ full count",
01355                    &LSQ_fcount, /* initial value */0, /* format */NULL);
01356   stat_reg_formula(sdb, "lsq_occupancy", "avg LSQ occupancy (insn's)",
01357                    "LSQ_count / sim_cycle", /* format */NULL);
01358   stat_reg_formula(sdb, "lsq_rate", "avg LSQ dispatch rate (insn/cycle)",
01359                    "sim_total_insn / sim_cycle", /* format */NULL);
01360   stat_reg_formula(sdb, "lsq_latency", "avg LSQ occupant latency (cycle's)",
01361                    "lsq_occupancy / lsq_rate", /* format */NULL);
01362   stat_reg_formula(sdb, "lsq_full", "fraction of time (cycle's) LSQ was full",
01363                    "LSQ_fcount / sim_cycle", /* format */NULL);
01364 
01365   stat_reg_counter(sdb, "sim_slip",
01366                    "total number of slip cycles",
01367                    &sim_slip, 0, NULL);
01368   /* register baseline stats */
01369   stat_reg_formula(sdb, "avg_sim_slip",
01370                    "the average slip between issue and retirement",
01371                    "sim_slip / sim_num_insn", NULL);
01372 
01373   /* register predictor stats */
01374   if (pred)
01375     bpred_reg_stats(pred, sdb);
01376 
01377   /* register cache stats */
01378   if (cache_il1
01379       && (cache_il1 != cache_dl1 && cache_il1 != cache_dl2))
01380     cache_reg_stats(cache_il1, sdb);
01381   if (cache_il2
01382       && (cache_il2 != cache_dl1 && cache_il2 != cache_dl2))
01383     cache_reg_stats(cache_il2, sdb);
01384   if (cache_dl1)
01385     cache_reg_stats(cache_dl1, sdb);
01386   if (cache_dl2)
01387     cache_reg_stats(cache_dl2, sdb);
01388   if (itlb)
01389     cache_reg_stats(itlb, sdb);
01390   if (dtlb)
01391     cache_reg_stats(dtlb, sdb);
01392 
01393   /* debug variable(s) */
01394   stat_reg_counter(sdb, "sim_invalid_addrs",
01395                    "total non-speculative bogus addresses seen (debug var)",
01396                    &sim_invalid_addrs, /* initial value */0, /* format */NULL);
01397 
01398   for (i=0; i<pcstat_nelt; i++)
01399     {
01400       char buf[512], buf1[512];
01401       struct stat_stat_t *stat;
01402 
01403       /* track the named statistical variable by text address */
01404 
01405       /* find it... */
01406       stat = stat_find_stat(sdb, pcstat_vars[i]);
01407       if (!stat)
01408         fatal("cannot locate any statistic named `%s'", pcstat_vars[i]);
01409 
01410       /* stat must be an integral type */
01411       if (stat->sc != sc_int && stat->sc != sc_uint && stat->sc != sc_counter)
01412         fatal("`-pcstat' statistical variable `%s' is not an integral type",
01413               stat->name);
01414 
01415       /* register this stat */
01416       pcstat_stats[i] = stat;
01417       pcstat_lastvals[i] = STATVAL(stat);
01418 
01419       /* declare the sparce text distribution */
01420       sprintf(buf, "%s_by_pc", stat->name);
01421       sprintf(buf1, "%s (by text address)", stat->desc);
01422       pcstat_sdists[i] = stat_reg_sdist(sdb, buf, buf1,
01423                                         /* initial value */0,
01424                                         /* print format */(PF_COUNT|PF_PDF),
01425                                         /* format */"0x%lx %lu %.2f",
01426                                         /* print fn */NULL);
01427     }
01428   ld_reg_stats(sdb);
01429   mem_reg_stats(mem, sdb);
01430 }
01431 
01432 /* forward declarations */
01433 static void ruu_init(void);
01434 static void lsq_init(void);
01435 static void rslink_init(int nlinks);
01436 static void eventq_init(void);
01437 static void readyq_init(void);
01438 static void cv_init(void);
01439 static void tracer_init(void);
01440 static void fetch_init(void);
01441 
01442 /* initialize the simulator */
01443 void
01444 sim_init(void)
01445 {
01446   sim_num_refs = 0;
01447 
01448   /* allocate and initialize register file */
01449   regs_init(&regs);
01450 
01451   /* allocate and initialize memory space */
01452   mem = mem_create("mem");
01453   mem_init(mem);
01454 }
01455 
01456 /* default register state accessor, used by DLite */
01457 static char *                                   /* err str, NULL for no err */
01458 simoo_reg_obj(struct regs_t *regs,              /* registers to access */
01459               int is_write,                     /* access type */
01460               enum md_reg_type rt,              /* reg bank to probe */
01461               int reg,                          /* register number */
01462               struct eval_value_t *val);        /* input, output */
01463 
01464 /* default memory state accessor, used by DLite */
01465 static char *                                   /* err str, NULL for no err */
01466 simoo_mem_obj(struct mem_t *mem,                /* memory space to access */
01467               int is_write,                     /* access type */
01468               md_addr_t addr,                   /* address to access */
01469               char *p,                          /* input/output buffer */
01470               int nbytes);                      /* size of access */
01471 
01472 /* default machine state accessor, used by DLite */
01473 static char *                                   /* err str, NULL for no err */
01474 simoo_mstate_obj(FILE *stream,                  /* output stream */
01475                  char *cmd,                     /* optional command string */
01476                  struct regs_t *regs,           /* registers to access */
01477                  struct mem_t *mem);            /* memory space to access */
01478 
01479 /* total RS links allocated at program start */
01480 #define MAX_RS_LINKS                    4096
01481 
01482 /* load program into simulated state */
01483 void
01484 sim_load_prog(char *fname,              /* program to load */
01485               int argc, char **argv,    /* program arguments */
01486               char **envp)              /* program environment */
01487 {
01488   /* load program text and data, set up environment, memory, and regs */
01489   ld_load_prog(fname, argc, argv, envp, &regs, mem, TRUE);
01490 
01491   /* initialize here, so symbols can be loaded */
01492   if (ptrace_nelt == 2)
01493     {
01494       /* generate a pipeline trace */
01495       ptrace_open(/* fname */ptrace_opts[0], /* range */ptrace_opts[1]);
01496     }
01497   else if (ptrace_nelt == 0)
01498     {
01499       /* no pipetracing */;
01500     }
01501   else
01502     fatal("bad pipetrace args, use: <fname|stdout|stderr> <range>");
01503 
01504   /* finish initialization of the simulation engine */
01505   fu_pool = res_create_pool("fu-pool", fu_config, N_ELT(fu_config));
01506   rslink_init(MAX_RS_LINKS);
01507   tracer_init();
01508   fetch_init();
01509   cv_init();
01510   eventq_init();
01511   readyq_init();
01512   ruu_init();
01513   lsq_init();
01514 
01515   /* initialize the DLite debugger */
01516   dlite_init(simoo_reg_obj, simoo_mem_obj, simoo_mstate_obj);
01517 }
01518 
01519 /* dump simulator-specific auxiliary simulator statistics */
01520 void
01521 sim_aux_stats(FILE *stream)             /* output stream */
01522 {
01523   /* nada */
01524 }
01525 
01526 /* un-initialize the simulator */
01527 void
01528 sim_uninit(void)
01529 {
01530   if (ptrace_nelt > 0)
01531     ptrace_close();
01532 }
01533 
01534 
01535 /*
01536  * processor core definitions and declarations
01537  */
01538 
01539 /* inst tag type, used to tag an operation instance in the RUU */
01540 typedef unsigned int INST_TAG_TYPE;
01541 
01542 /* inst sequence type, used to order instructions in the ready list, if
01543    this rolls over the ready list order temporarily will get messed up,
01544    but execution will continue and complete correctly */
01545 typedef unsigned int INST_SEQ_TYPE;
01546 
01547 
01548 /* total input dependencies possible */
01549 #define MAX_IDEPS               3
01550 
01551 /* total output dependencies possible */
01552 #define MAX_ODEPS               2
01553 
01554 /* a register update unit (RUU) station, this record is contained in the
01555    processors RUU, which serves as a collection of ordered reservations
01556    stations.  The reservation stations capture register results and await
01557    the time when all operands are ready, at which time the instruction is
01558    issued to the functional units; the RUU is an order circular queue, in which
01559    instructions are inserted in fetch (program) order, results are stored in
01560    the RUU buffers, and later when an RUU entry is the oldest entry in the
01561    machines, it and its instruction's value is retired to the architectural
01562    register file in program order, NOTE: the RUU and LSQ share the same
01563    structure, this is useful because loads and stores are split into two
01564    operations: an effective address add and a load/store, the add is inserted
01565    into the RUU and the load/store inserted into the LSQ, allowing the add
01566    to wake up the load/store when effective address computation has finished */
01567 struct RUU_station {
01568   /* inst info */
01569   md_inst_t IR;                 /* instruction bits */
01570   enum md_opcode op;                    /* decoded instruction opcode */
01571   md_addr_t PC, next_PC, pred_PC;       /* inst PC, next PC, predicted PC */
01572   int in_LSQ;                           /* non-zero if op is in LSQ */
01573   int ea_comp;                          /* non-zero if op is an addr comp */
01574   int recover_inst;                     /* start of mis-speculation? */
01575   int stack_recover_idx;                /* non-speculative TOS for RSB pred */
01576   struct bpred_update_t dir_update;     /* bpred direction update info */
01577   int spec_mode;                        /* non-zero if issued in spec_mode */
01578   md_addr_t addr;                       /* effective address for ld/st's */
01579   INST_TAG_TYPE tag;                    /* RUU slot tag, increment to
01580                                            squash operation */
01581   INST_SEQ_TYPE seq;                    /* instruction sequence, used to
01582                                            sort the ready list and tag inst */
01583   unsigned int ptrace_seq;              /* pipetrace sequence number */
01584   int slip;
01585   /* instruction status */
01586   int queued;                           /* operands ready and queued */
01587   int issued;                           /* operation is/was executing */
01588   int completed;                        /* operation has completed execution */
01589   /* output operand dependency list, these lists are used to
01590      limit the number of associative searches into the RUU when
01591      instructions complete and need to wake up dependent insts */
01592   int onames[MAX_ODEPS];                /* output logical names (NA=unused) */
01593   struct RS_link *odep_list[MAX_ODEPS]; /* chains to consuming operations */
01594 
01595   /* input dependent links, the output chains rooted above use these
01596      fields to mark input operands as ready, when all these fields have
01597      been set non-zero, the RUU operation has all of its register
01598      operands, it may commence execution as soon as all of its memory
01599      operands are known to be read (see lsq_refresh() for details on
01600      enforcing memory dependencies) */
01601   int idep_ready[MAX_IDEPS];            /* input operand ready? */
01602 };
01603 
01604 /* non-zero if all register operands are ready, update with MAX_IDEPS */
01605 #define OPERANDS_READY(RS)                                              \
01606   ((RS)->idep_ready[0] && (RS)->idep_ready[1] && (RS)->idep_ready[2])
01607 
01608 /* register update unit, combination of reservation stations and reorder
01609    buffer device, organized as a circular queue */
01610 static struct RUU_station *RUU;         /* register update unit */
01611 static int RUU_head, RUU_tail;          /* RUU head and tail pointers */
01612 static int RUU_num;                     /* num entries currently in RUU */
01613 
01614 /* allocate and initialize register update unit (RUU) */
01615 static void
01616 ruu_init(void)
01617 {
01618   RUU = calloc(RUU_size, sizeof(struct RUU_station));
01619   if (!RUU)
01620     fatal("out of virtual memory");
01621 
01622   RUU_num = 0;
01623   RUU_head = RUU_tail = 0;
01624   RUU_count = 0;
01625   RUU_fcount = 0;
01626 }
01627 
01628 /* dump the contents of the RUU */
01629 static void
01630 ruu_dumpent(struct RUU_station *rs,             /* ptr to RUU station */
01631             int index,                          /* entry index */
01632             FILE *stream,                       /* output stream */
01633             int header)                         /* print header? */
01634 {
01635   if (!stream)
01636     stream = stderr;
01637 
01638   if (header)
01639     fprintf(stream, "idx: %2d: opcode: %s, inst: `",
01640             index, MD_OP_NAME(rs->op));
01641   else
01642     fprintf(stream, "       opcode: %s, inst: `",
01643             MD_OP_NAME(rs->op));
01644   md_print_insn(rs->IR, rs->PC, stream);
01645   fprintf(stream, "'\n");
01646   myfprintf(stream, "         PC: 0x%08p, NPC: 0x%08p (pred_PC: 0x%08p)\n",
01647             rs->PC, rs->next_PC, rs->pred_PC);
01648   fprintf(stream, "         in_LSQ: %s, ea_comp: %s, recover_inst: %s\n",
01649           rs->in_LSQ ? "t" : "f",
01650           rs->ea_comp ? "t" : "f",
01651           rs->recover_inst ? "t" : "f");
01652   myfprintf(stream, "         spec_mode: %s, addr: 0x%08p, tag: 0x%08x\n",
01653             rs->spec_mode ? "t" : "f", rs->addr, rs->tag);
01654   fprintf(stream, "         seq: 0x%08x, ptrace_seq: 0x%08x\n",
01655           rs->seq, rs->ptrace_seq);
01656   fprintf(stream, "         queued: %s, issued: %s, completed: %s\n",
01657           rs->queued ? "t" : "f",
01658           rs->issued ? "t" : "f",
01659           rs->completed ? "t" : "f");
01660   fprintf(stream, "         operands ready: %s\n",
01661           OPERANDS_READY(rs) ? "t" : "f");
01662 }
01663 
01664 /* dump the contents of the RUU */
01665 static void
01666 ruu_dump(FILE *stream)                          /* output stream */
01667 {
01668   int num, head;
01669   struct RUU_station *rs;
01670 
01671   if (!stream)
01672     stream = stderr;
01673 
01674   fprintf(stream, "** RUU state **\n");
01675   fprintf(stream, "RUU_head: %d, RUU_tail: %d\n", RUU_head, RUU_tail);
01676   fprintf(stream, "RUU_num: %d\n", RUU_num);
01677 
01678   num = RUU_num;
01679   head = RUU_head;
01680   while (num)
01681     {
01682       rs = &RUU[head];
01683       ruu_dumpent(rs, rs - RUU, stream, /* header */TRUE);
01684       head = (head + 1) % RUU_size;
01685       num--;
01686     }
01687 }
01688 
01689 /*
01690  * load/store queue (LSQ): holds loads and stores in program order, indicating
01691  * status of load/store access:
01692  *
01693  *   - issued: address computation complete, memory access in progress
01694  *   - completed: memory access has completed, stored value available
01695  *   - squashed: memory access was squashed, ignore this entry
01696  *
01697  * loads may execute when:
01698  *   1) register operands are ready, and
01699  *   2) memory operands are ready (no earlier unresolved store)
01700  *
01701  * loads are serviced by:
01702  *   1) previous store at same address in LSQ (hit latency), or
01703  *   2) data cache (hit latency + miss latency)
01704  *
01705  * stores may execute when:
01706  *   1) register operands are ready
01707  *
01708  * stores are serviced by:
01709  *   1) depositing store value into the load/store queue
01710  *   2) writing store value to the store buffer (plus tag check) at commit
01711  *   3) writing store buffer entry to data cache when cache is free
01712  *
01713  * NOTE: the load/store queue can bypass a store value to a load in the same
01714  *   cycle the store executes (using a bypass network), thus stores complete
01715  *   in effective zero time after their effective address is known
01716  */
01717 static struct RUU_station *LSQ;         /* load/store queue */
01718 static int LSQ_head, LSQ_tail;          /* LSQ head and tail pointers */
01719 static int LSQ_num;                     /* num entries currently in LSQ */
01720 
01721 /*
01722  * input dependencies for stores in the LSQ:
01723  *   idep #0 - operand input (value that is store'd)
01724  *   idep #1 - effective address input (address of store operation)
01725  */
01726 #define STORE_OP_INDEX                  0
01727 #define STORE_ADDR_INDEX                1
01728 
01729 #define STORE_OP_READY(RS)              ((RS)->idep_ready[STORE_OP_INDEX])
01730 #define STORE_ADDR_READY(RS)            ((RS)->idep_ready[STORE_ADDR_INDEX])
01731 
01732 /* allocate and initialize the load/store queue (LSQ) */
01733 static void
01734 lsq_init(void)
01735 {
01736   LSQ = calloc(LSQ_size, sizeof(struct RUU_station));
01737   if (!LSQ)
01738     fatal("out of virtual memory");
01739 
01740   LSQ_num = 0;
01741   LSQ_head = LSQ_tail = 0;
01742   LSQ_count = 0;
01743   LSQ_fcount = 0;
01744 }
01745 
01746 /* dump the contents of the RUU */
01747 static void
01748 lsq_dump(FILE *stream)                          /* output stream */
01749 {
01750   int num, head;
01751   struct RUU_station *rs;
01752 
01753   if (!stream)
01754     stream = stderr;
01755 
01756   fprintf(stream, "** LSQ state **\n");
01757   fprintf(stream, "LSQ_head: %d, LSQ_tail: %d\n", LSQ_head, LSQ_tail);
01758   fprintf(stream, "LSQ_num: %d\n", LSQ_num);
01759 
01760   num = LSQ_num;
01761   head = LSQ_head;
01762   while (num)
01763     {
01764       rs = &LSQ[head];
01765       ruu_dumpent(rs, rs - LSQ, stream, /* header */TRUE);
01766       head = (head + 1) % LSQ_size;
01767       num--;
01768     }
01769 }
01770 
01771 
01772 /*
01773  * RS_LINK defs and decls
01774  */
01775 
01776 /* a reservation station link: this structure links elements of a RUU
01777    reservation station list; used for ready instruction queue, event queue, and
01778    output dependency lists; each RS_LINK node contains a pointer to the RUU
01779    entry it references along with an instance tag, the RS_LINK is only valid if
01780    the instruction instance tag matches the instruction RUU entry instance tag;
01781    this strategy allows entries in the RUU can be squashed and reused without
01782    updating the lists that point to it, which significantly improves the
01783    performance of (all to frequent) squash events */
01784 struct RS_link {
01785   struct RS_link *next;                 /* next entry in list */
01786   struct RUU_station *rs;               /* referenced RUU resv station */
01787   INST_TAG_TYPE tag;                    /* inst instance sequence number */
01788   union {
01789     tick_t when;                        /* time stamp of entry (for eventq) */
01790     INST_SEQ_TYPE seq;                  /* inst sequence */
01791     int opnum;                          /* input/output operand number */
01792   } x;
01793 };
01794 
01795 /* RS link free list, grab RS_LINKs from here, when needed */
01796 static struct RS_link *rslink_free_list;
01797 
01798 /* NULL value for an RS link */
01799 #define RSLINK_NULL_DATA                { NULL, NULL, 0 }
01800 static struct RS_link RSLINK_NULL = RSLINK_NULL_DATA;
01801 
01802 /* create and initialize an RS link */
01803 #define RSLINK_INIT(RSL, RS)                                            \
01804   ((RSL).next = NULL, (RSL).rs = (RS), (RSL).tag = (RS)->tag)
01805 
01806 /* non-zero if RS link is NULL */
01807 #define RSLINK_IS_NULL(LINK)            ((LINK)->rs == NULL)
01808 
01809 /* non-zero if RS link is to a valid (non-squashed) entry */
01810 #define RSLINK_VALID(LINK)              ((LINK)->tag == (LINK)->rs->tag)
01811 
01812 /* extra RUU reservation station pointer */
01813 #define RSLINK_RS(LINK)                 ((LINK)->rs)
01814 
01815 /* get a new RS link record */
01816 #define RSLINK_NEW(DST, RS)                                             \
01817   { struct RS_link *n_link;                                             \
01818     if (!rslink_free_list)                                              \
01819       panic("out of rs links");                                         \
01820     n_link = rslink_free_list;                                          \
01821     rslink_free_list = rslink_free_list->next;                          \
01822     n_link->next = NULL;                                                \
01823     n_link->rs = (RS); n_link->tag = n_link->rs->tag;                   \
01824     (DST) = n_link;                                                     \
01825   }
01826 
01827 /* free an RS link record */
01828 #define RSLINK_FREE(LINK)                                               \
01829   {  struct RS_link *f_link = (LINK);                                   \
01830      f_link->rs = NULL; f_link->tag = 0;                                \
01831      f_link->next = rslink_free_list;                                   \
01832      rslink_free_list = f_link;                                         \
01833   }
01834 
01835 /* FIXME: could this be faster!!! */
01836 /* free an RS link list */
01837 #define RSLINK_FREE_LIST(LINK)                                          \
01838   {  struct RS_link *fl_link, *fl_link_next;                            \
01839      for (fl_link=(LINK); fl_link; fl_link=fl_link_next)                \
01840        {                                                                \
01841          fl_link_next = fl_link->next;                                  \
01842          RSLINK_FREE(fl_link);                                          \
01843        }                                                                \
01844   }
01845 
01846 /* initialize the free RS_LINK pool */
01847 static void
01848 rslink_init(int nlinks)                 /* total number of RS_LINK available */
01849 {
01850   int i;
01851   struct RS_link *link;
01852 
01853   rslink_free_list = NULL;
01854   for (i=0; i<nlinks; i++)
01855     {
01856       link = calloc(1, sizeof(struct RS_link));
01857       if (!link)
01858         fatal("out of virtual memory");
01859       link->next = rslink_free_list;
01860       rslink_free_list = link;
01861     }
01862 }
01863 
01864 /* service all functional unit release events, this function is called
01865    once per cycle, and it used to step the BUSY timers attached to each
01866    functional unit in the function unit resource pool, as long as a functional
01867    unit's BUSY count is > 0, it cannot be issued an operation */
01868 static void
01869 ruu_release_fu(void)
01870 {
01871   int i;
01872 
01873   /* walk all resource units, decrement busy counts by one */
01874   for (i=0; i<fu_pool->num_resources; i++)
01875     {
01876       /* resource is released when BUSY hits zero */
01877       if (fu_pool->resources[i].busy > 0)
01878         fu_pool->resources[i].busy--;
01879     }
01880 }
01881 
01882 
01883 /*
01884  * the execution unit event queue implementation follows, the event queue
01885  * indicates which instruction will complete next, the writeback handler
01886  * drains this queue
01887  */
01888 
01889 /* pending event queue, sorted from soonest to latest event (in time), NOTE:
01890    RS_LINK nodes are used for the event queue list so that it need not be
01891    updated during squash events */
01892 static struct RS_link *event_queue;
01893 
01894 /* initialize the event queue structures */
01895 static void
01896 eventq_init(void)
01897 {
01898   event_queue = NULL;
01899 }
01900 
01901 /* dump the contents of the event queue */
01902 static void
01903 eventq_dump(FILE *stream)                       /* output stream */
01904 {
01905   struct RS_link *ev;
01906 
01907   if (!stream)
01908     stream = stderr;
01909 
01910   fprintf(stream, "** event queue state **\n");
01911 
01912   for (ev = event_queue; ev != NULL; ev = ev->next)
01913     {
01914       /* is event still valid? */
01915       if (RSLINK_VALID(ev))
01916         {
01917           struct RUU_station *rs = RSLINK_RS(ev);
01918 
01919           fprintf(stream, "idx: %2d: @ %.0f\n",
01920                   (int)(rs - (rs->in_LSQ ? LSQ : RUU)), (double)ev->x.when);
01921           ruu_dumpent(rs, rs - (rs->in_LSQ ? LSQ : RUU),
01922                       stream, /* !header */FALSE);
01923         }
01924     }
01925 }
01926 
01927 /* insert an event for RS into the event queue, event queue is sorted from
01928    earliest to latest event, event and associated side-effects will be
01929    apparent at the start of cycle WHEN */
01930 static void
01931 eventq_queue_event(struct RUU_station *rs, tick_t when)
01932 {
01933   struct RS_link *prev, *ev, *new_ev;
01934 
01935   if (rs->completed)
01936     panic("event completed");
01937 
01938   if (when <= sim_cycle)
01939     panic("event occurred in the past");
01940 
01941   /* get a free event record */
01942   RSLINK_NEW(new_ev, rs);
01943   new_ev->x.when = when;
01944 
01945   /* locate insertion point */
01946   for (prev=NULL, ev=event_queue;
01947        ev && ev->x.when < when;
01948        prev=ev, ev=ev->next);
01949 
01950   if (prev)
01951     {
01952       /* insert middle or end */
01953       new_ev->next = prev->next;
01954       prev->next = new_ev;
01955     }
01956   else
01957     {
01958       /* insert at beginning */
01959       new_ev->next = event_queue;
01960       event_queue = new_ev;
01961     }
01962 }
01963 
01964 /* return the next event that has already occurred, returns NULL when no
01965    remaining events or all remaining events are in the future */
01966 static struct RUU_station *
01967 eventq_next_event(void)
01968 {
01969   struct RS_link *ev;
01970 
01971   if (event_queue && event_queue->x.when <= sim_cycle)
01972     {
01973       /* unlink and return first event on priority list */
01974       ev = event_queue;
01975       event_queue = event_queue->next;
01976 
01977       /* event still valid? */
01978       if (RSLINK_VALID(ev))
01979         {
01980           struct RUU_station *rs = RSLINK_RS(ev);
01981 
01982           /* reclaim event record */
01983           RSLINK_FREE(ev);
01984 
01985           /* event is valid, return resv station */
01986           return rs;
01987         }
01988       else
01989         {
01990           /* reclaim event record */
01991           RSLINK_FREE(ev);
01992 
01993           /* receiving inst was squashed, return next event */
01994           return eventq_next_event();
01995         }
01996     }
01997   else
01998     {
01999       /* no event or no event is ready */
02000       return NULL;
02001     }
02002 }
02003 
02004 
02005 /*
02006  * the ready instruction queue implementation follows, the ready instruction
02007  * queue indicates which instruction have all of there *register* dependencies
02008  * satisfied, instruction will issue when 1) all memory dependencies for
02009  * the instruction have been satisfied (see lsq_refresh() for details on how
02010  * this is accomplished) and 2) resources are available; ready queue is fully
02011  * constructed each cycle before any operation is issued from it -- this
02012  * ensures that instruction issue priorities are properly observed; NOTE:
02013  * RS_LINK nodes are used for the event queue list so that it need not be
02014  * updated during squash events
02015  */
02016 
02017 /* the ready instruction queue */
02018 static struct RS_link *ready_queue;
02019 
02020 /* initialize the event queue structures */
02021 static void
02022 readyq_init(void)
02023 {
02024   ready_queue = NULL;
02025 }
02026 
02027 /* dump the contents of the ready queue */
02028 static void
02029 readyq_dump(FILE *stream)                       /* output stream */
02030 {
02031   struct RS_link *link;
02032 
02033   if (!stream)
02034     stream = stderr;
02035 
02036   fprintf(stream, "** ready queue state **\n");
02037 
02038   for (link = ready_queue; link != NULL; link = link->next)
02039     {
02040       /* is entry still valid? */
02041       if (RSLINK_VALID(link))
02042         {
02043           struct RUU_station *rs = RSLINK_RS(link);
02044 
02045           ruu_dumpent(rs, rs - (rs->in_LSQ ? LSQ : RUU),
02046                       stream, /* header */TRUE);
02047         }
02048     }
02049 }
02050 
02051 /* insert ready node into the ready list using ready instruction scheduling
02052    policy; currently the following scheduling policy is enforced:
02053 
02054      memory and long latency operands, and branch instructions first
02055 
02056    then
02057 
02058      all other instructions, oldest instructions first
02059 
02060   this policy works well because branches pass through the machine quicker
02061   which works to reduce branch misprediction latencies, and very long latency
02062   instructions (such loads and multiplies) get priority since they are very
02063   likely on the program's critical path */
02064 static void
02065 readyq_enqueue(struct RUU_station *rs)          /* RS to enqueue */
02066 {
02067   struct RS_link *prev, *node, *new_node;
02068 
02069   /* node is now queued */
02070   if (rs->queued)
02071     panic("node is already queued");
02072   rs->queued = TRUE;
02073 
02074   /* get a free ready list node */
02075   RSLINK_NEW(new_node, rs);
02076   new_node->x.seq = rs->seq;
02077 
02078   /* locate insertion point */
02079   if (rs->in_LSQ || MD_OP_FLAGS(rs->op) & (F_LONGLAT|F_CTRL))
02080     {
02081       /* insert loads/stores and long latency ops at the head of the queue */
02082       prev = NULL;
02083       node = ready_queue;
02084     }
02085   else
02086     {
02087       /* otherwise insert in program order (earliest seq first) */
02088       for (prev=NULL, node=ready_queue;
02089            node && node->x.seq < rs->seq;
02090            prev=node, node=node->next);
02091     }
02092 
02093   if (prev)
02094     {
02095       /* insert middle or end */
02096       new_node->next = prev->next;
02097       prev->next = new_node;
02098     }
02099   else
02100     {
02101       /* insert at beginning */
02102       new_node->next = ready_queue;
02103       ready_queue = new_node;
02104     }
02105 }
02106 
02107 
02108 /*
02109  * the create vector maps a logical register to a creator in the RUU (and
02110  * specific output operand) or the architected register file (if RS_link
02111  * is NULL)
02112  */
02113 
02114 /* an entry in the create vector */
02115 struct CV_link {
02116   struct RUU_station *rs;               /* creator's reservation station */
02117   int odep_num;                         /* specific output operand */
02118 };
02119 
02120 /* a NULL create vector entry */
02121 static struct CV_link CVLINK_NULL = { NULL, 0 };
02122 
02123 /* get a new create vector link */
02124 #define CVLINK_INIT(CV, RS,ONUM)        ((CV).rs = (RS), (CV).odep_num = (ONUM))
02125 
02126 /* size of the create vector (one entry per architected register) */
02127 #define CV_BMAP_SZ              (BITMAP_SIZE(MD_TOTAL_REGS))
02128 
02129 /* the create vector, NOTE: speculative copy on write storage provided
02130    for fast recovery during wrong path execute (see tracer_recover() for
02131    details on this process */
02132 static BITMAP_TYPE(MD_TOTAL_REGS, use_spec_cv);
02133 static struct CV_link create_vector[MD_TOTAL_REGS];
02134 static struct CV_link spec_create_vector[MD_TOTAL_REGS];
02135 
02136 /* these arrays shadow the create vector an indicate when a register was
02137    last created */
02138 static tick_t create_vector_rt[MD_TOTAL_REGS];
02139 static tick_t spec_create_vector_rt[MD_TOTAL_REGS];
02140 
02141 /* read a create vector entry */
02142 #define CREATE_VECTOR(N)        (BITMAP_SET_P(use_spec_cv, CV_BMAP_SZ, (N))\
02143                                  ? spec_create_vector[N]                \
02144                                  : create_vector[N])
02145 
02146 /* read a create vector timestamp entry */
02147 #define CREATE_VECTOR_RT(N)     (BITMAP_SET_P(use_spec_cv, CV_BMAP_SZ, (N))\
02148                                  ? spec_create_vector_rt[N]             \
02149                                  : create_vector_rt[N])
02150 
02151 /* set a create vector entry */
02152 #define SET_CREATE_VECTOR(N, L) (spec_mode                              \
02153                                  ? (BITMAP_SET(use_spec_cv, CV_BMAP_SZ, (N)),\
02154                                     spec_create_vector[N] = (L))        \
02155                                  : (create_vector[N] = (L)))
02156 
02157 /* initialize the create vector */
02158 static void
02159 cv_init(void)
02160 {
02161   int i;
02162 
02163   /* initially all registers are valid in the architected register file,
02164      i.e., the create vector entry is CVLINK_NULL */
02165   for (i=0; i < MD_TOTAL_REGS; i++)
02166     {
02167       create_vector[i] = CVLINK_NULL;
02168       create_vector_rt[i] = 0;
02169       spec_create_vector[i] = CVLINK_NULL;
02170       spec_create_vector_rt[i] = 0;
02171     }
02172 
02173   /* all create vector entries are non-speculative */
02174   BITMAP_CLEAR_MAP(use_spec_cv, CV_BMAP_SZ);
02175 }
02176 
02177 /* dump the contents of the create vector */
02178 static void
02179 cv_dump(FILE *stream)                           /* output stream */
02180 {
02181   int i;
02182   struct CV_link ent;
02183 
02184   if (!stream)
02185     stream = stderr;
02186 
02187   fprintf(stream, "** create vector state **\n");
02188 
02189   for (i=0; i < MD_TOTAL_REGS; i++)
02190     {
02191       ent = CREATE_VECTOR(i);
02192       if (!ent.rs)
02193         fprintf(stream, "[cv%02d]: from architected reg file\n", i);
02194       else
02195         fprintf(stream, "[cv%02d]: from %s, idx: %d\n",
02196                 i, (ent.rs->in_LSQ ? "LSQ" : "RUU"),
02197                 (int)(ent.rs - (ent.rs->in_LSQ ? LSQ : RUU)));
02198     }
02199 }
02200 
02201 
02202 /*
02203  *  RUU_COMMIT() - instruction retirement pipeline stage
02204  */
02205 
02206 /* this function commits the results of the oldest completed entries from the
02207    RUU and LSQ to the architected reg file, stores in the LSQ will commit
02208    their store data to the data cache at this point as well */
02209 static void
02210 ruu_commit(void)
02211 {
02212   int i, lat, events, committed = 0;
02213   static counter_t sim_ret_insn = 0;
02214 
02215   /* all values must be retired to the architected reg file in program order */
02216   while (RUU_num > 0 && committed < ruu_commit_width)
02217     {
02218       struct RUU_station *rs = &(RUU[RUU_head]);
02219 
02220       if (!rs->completed)
02221         {
02222           /* at least RUU entry must be complete */
02223           break;
02224         }
02225 
02226       /* default commit events */
02227       events = 0;
02228 
02229       /* load/stores must retire load/store queue entry as well */
02230       if (RUU[RUU_head].ea_comp)
02231         {
02232           /* load/store, retire head of LSQ as well */
02233           if (LSQ_num <= 0 || !LSQ[LSQ_head].in_LSQ)
02234             panic("RUU out of sync with LSQ");
02235 
02236           /* load/store operation must be complete */
02237           if (!LSQ[LSQ_head].completed)
02238             {
02239               /* load/store operation is not yet complete */
02240               break;
02241             }
02242 
02243           if ((MD_OP_FLAGS(LSQ[LSQ_head].op) & (F_MEM|F_STORE))
02244               == (F_MEM|F_STORE))
02245             {
02246               struct res_template *fu;
02247 
02248 
02249               /* stores must retire their store value to the cache at commit,
02250                  try to get a store port (functional unit allocation) */
02251               fu = res_get(fu_pool, MD_OP_FUCLASS(LSQ[LSQ_head].op));
02252               if (fu)
02253                 {
02254                   /* reserve the functional unit */
02255                   if (fu->master->busy)
02256                     panic("functional unit already in use");
02257 
02258                   /* schedule functional unit release event */
02259                   fu->master->busy = fu->issuelat;
02260 
02261                   /* go to the data cache */
02262                   if (cache_dl1)
02263                     {
02264                       /* commit store value to D-cache */
02265                       lat =
02266                         cache_access(cache_dl1, Write, (LSQ[LSQ_head].addr&~3),
02267                                      NULL, 4, sim_cycle, NULL, NULL);
02268                       if (lat > cache_dl1_lat)
02269                         events |= PEV_CACHEMISS;
02270                     }
02271 
02272                   /* all loads and stores must to access D-TLB */
02273                   if (dtlb)
02274                     {
02275                       /* access the D-TLB */
02276                       lat =
02277                         cache_access(dtlb, Read, (LSQ[LSQ_head].addr & ~3),
02278                                      NULL, 4, sim_cycle, NULL, NULL);
02279                       if (lat > 1)
02280                         events |= PEV_TLBMISS;
02281                     }
02282                 }
02283               else
02284                 {
02285                   /* no store ports left, cannot continue to commit insts */
02286                   break;
02287                 }
02288             }
02289 
02290           /* invalidate load/store operation instance */
02291           LSQ[LSQ_head].tag++;
02292           sim_slip += (sim_cycle - LSQ[LSQ_head].slip);
02293    
02294           /* indicate to pipeline trace that this instruction retired */
02295           ptrace_newstage(LSQ[LSQ_head].ptrace_seq, PST_COMMIT, events);
02296           ptrace_endinst(LSQ[LSQ_head].ptrace_seq);
02297 
02298           /* commit head of LSQ as well */
02299           LSQ_head = (LSQ_head + 1) % LSQ_size;
02300           LSQ_num--;
02301         }
02302 
02303       if (pred
02304           && bpred_spec_update == spec_CT
02305           && (MD_OP_FLAGS(rs->op) & F_CTRL))
02306         {
02307           bpred_update(pred,
02308                        /* branch address */rs->PC,
02309                        /* actual target address */rs->next_PC,
02310                        /* taken? */rs->next_PC != (rs->PC +
02311                                                    sizeof(md_inst_t)),
02312                        /* pred taken? */rs->pred_PC != (rs->PC +
02313                                                         sizeof(md_inst_t)),
02314                        /* correct pred? */rs->pred_PC == rs->next_PC,
02315                        /* opcode */rs->op,
02316                        /* dir predictor update pointer */&rs->dir_update);
02317         }
02318 
02319       /* invalidate RUU operation instance */
02320       RUU[RUU_head].tag++;
02321       sim_slip += (sim_cycle - RUU[RUU_head].slip);
02322       /* print retirement trace if in verbose mode */
02323       if (verbose)
02324         {
02325           sim_ret_insn++;
02326           myfprintf(stderr, "%10n @ 0x%08p: ", sim_ret_insn, RUU[RUU_head].PC);
02327           md_print_insn(RUU[RUU_head].IR, RUU[RUU_head].PC, stderr);
02328           if (MD_OP_FLAGS(RUU[RUU_head].op) & F_MEM)
02329             myfprintf(stderr, "  mem: 0x%08p", RUU[RUU_head].addr);
02330           fprintf(stderr, "\n");
02331           /* fflush(stderr); */
02332         }
02333 
02334       /* indicate to pipeline trace that this instruction retired */
02335       ptrace_newstage(RUU[RUU_head].ptrace_seq, PST_COMMIT, events);
02336       ptrace_endinst(RUU[RUU_head].ptrace_seq);
02337 
02338       /* commit head entry of RUU */
02339       RUU_head = (RUU_head + 1) % RUU_size;
02340       RUU_num--;
02341 
02342       /* one more instruction committed to architected state */
02343       committed++;
02344 
02345       for (i=0; i<MAX_ODEPS; i++)
02346         {
02347           if (rs->odep_list[i])
02348             panic ("retired instruction has odeps\n");
02349         }
02350     }
02351 }
02352 
02353 
02354 /*
02355  *  RUU_RECOVER() - squash mispredicted microarchitecture state
02356  */
02357 
02358 /* recover processor microarchitecture state back to point of the
02359    mis-predicted branch at RUU[BRANCH_INDEX] */
02360 static void
02361 ruu_recover(int branch_index)                   /* index of mis-pred branch */
02362 {
02363   int i, RUU_index = RUU_tail, LSQ_index = LSQ_tail;
02364   int RUU_prev_tail = RUU_tail, LSQ_prev_tail = LSQ_tail;
02365 
02366   /* recover from the tail of the RUU towards the head until the branch index
02367      is reached, this direction ensures that the LSQ can be synchronized with
02368      the RUU */
02369 
02370   /* go to first element to squash */
02371   RUU_index = (RUU_index + (RUU_size-1)) % RUU_size;
02372   LSQ_index = (LSQ_index + (LSQ_size-1)) % LSQ_size;
02373 
02374   /* traverse to older insts until the mispredicted branch is encountered */
02375   while (RUU_index != branch_index)
02376     {
02377       /* the RUU should not drain since the mispredicted branch will remain */
02378       if (!RUU_num)
02379         panic("empty RUU");
02380 
02381       /* should meet up with the tail first */
02382       if (RUU_index == RUU_head)
02383         panic("RUU head and tail broken");
02384 
02385       /* is this operation an effective addr calc for a load or store? */
02386       if (RUU[RUU_index].ea_comp)
02387         {
02388           /* should be at least one load or store in the LSQ */
02389           if (!LSQ_num)
02390             panic("RUU and LSQ out of sync");
02391 
02392           /* recover any resources consumed by the load or store operation */
02393           for (i=0; i<MAX_ODEPS; i++)
02394             {
02395               RSLINK_FREE_LIST(LSQ[LSQ_index].odep_list[i]);
02396               /* blow away the consuming op list */
02397               LSQ[LSQ_index].odep_list[i] = NULL;
02398             }
02399       
02400           /* squash this LSQ entry */
02401           LSQ[LSQ_index].tag++;
02402 
02403           /* indicate in pipetrace that this instruction was squashed */
02404           ptrace_endinst(LSQ[LSQ_index].ptrace_seq);
02405 
02406           /* go to next earlier LSQ slot */
02407           LSQ_prev_tail = LSQ_index;
02408           LSQ_index = (LSQ_index + (LSQ_size-1)) % LSQ_size;
02409           LSQ_num--;
02410         }
02411 
02412       /* recover any resources used by this RUU operation */
02413       for (i=0; i<MAX_ODEPS; i++)
02414         {
02415           RSLINK_FREE_LIST(RUU[RUU_index].odep_list[i]);
02416           /* blow away the consuming op list */
02417           RUU[RUU_index].odep_list[i] = NULL;
02418         }
02419       
02420       /* squash this RUU entry */
02421       RUU[RUU_index].tag++;
02422 
02423       /* indicate in pipetrace that this instruction was squashed */
02424       ptrace_endinst(RUU[RUU_index].ptrace_seq);
02425 
02426       /* go to next earlier slot in the RUU */
02427       RUU_prev_tail = RUU_index;
02428       RUU_index = (RUU_index + (RUU_size-1)) % RUU_size;
02429       RUU_num--;
02430     }
02431 
02432   /* reset head/tail pointers to point to the mis-predicted branch */
02433   RUU_tail = RUU_prev_tail;
02434   LSQ_tail = LSQ_prev_tail;
02435 
02436   /* revert create vector back to last precise create vector state, NOTE:
02437      this is accomplished by resetting all the copied-on-write bits in the
02438      USE_SPEC_CV bit vector */
02439   BITMAP_CLEAR_MAP(use_spec_cv, CV_BMAP_SZ);
02440 
02441   /* FIXME: could reset functional units at squash time */
02442 }
02443 
02444 
02445 /*
02446  *  RUU_WRITEBACK() - instruction result writeback pipeline stage
02447  */
02448 
02449 /* forward declarations */
02450 static void tracer_recover(void);
02451 
02452 /* writeback completed operation results from the functional units to RUU,
02453    at this point, the output dependency chains of completing instructions
02454    are also walked to determine if any dependent instruction now has all
02455    of its register operands, if so the (nearly) ready instruction is inserted
02456    into the ready instruction queue */
02457 static void
02458 ruu_writeback(void)
02459 {
02460   int i;
02461   struct RUU_station *rs;
02462 
02463   /* service all completed events */
02464   while ((rs = eventq_next_event()))
02465     {
02466       /* RS has completed execution and (possibly) produced a result */
02467       if (!OPERANDS_READY(rs) || rs->queued || !rs->issued || rs->completed)
02468         panic("inst completed and !ready, !issued, or completed");
02469 
02470       /* operation has completed */
02471       rs->completed = TRUE;
02472 
02473       /* does this operation reveal a mis-predicted branch? */
02474       if (rs->recover_inst)
02475         {
02476           if (rs->in_LSQ)
02477             panic("mis-predicted load or store?!?!?");
02478 
02479           /* recover processor state and reinit fetch to correct path */
02480           ruu_recover(rs - RUU);
02481           tracer_recover();
02482           bpred_recover(pred, rs->PC, rs->stack_recover_idx);
02483 
02484           /* stall fetch until I-fetch and I-decode recover */
02485           ruu_fetch_issue_delay = ruu_branch_penalty;
02486 
02487           /* continue writeback of the branch/control instruction */
02488         }
02489 
02490       /* if we speculatively update branch-predictor, do it here */
02491       if (pred
02492           && bpred_spec_update == spec_WB
02493           && !rs->in_LSQ
02494           && (MD_OP_FLAGS(rs->op) & F_CTRL))
02495         {
02496           bpred_update(pred,
02497                        /* branch address */rs->PC,
02498                        /* actual target address */rs->next_PC,
02499                        /* taken? */rs->next_PC != (rs->PC +
02500                                                    sizeof(md_inst_t)),
02501                        /* pred taken? */rs->pred_PC != (rs->PC +
02502                                                         sizeof(md_inst_t)),
02503                        /* correct pred? */rs->pred_PC == rs->next_PC,
02504                        /* opcode */rs->op,
02505                        /* dir predictor update pointer */&rs->dir_update);
02506         }
02507 
02508       /* entered writeback stage, indicate in pipe trace */
02509       ptrace_newstage(rs->ptrace_seq, PST_WRITEBACK,
02510                       rs->recover_inst ? PEV_MPDETECT : 0);
02511 
02512       /* broadcast results to consuming operations, this is more efficiently
02513          accomplished by walking the output dependency chains of the
02514          completed instruction */
02515       for (i=0; i<MAX_ODEPS; i++)
02516         {
02517           if (rs->onames[i] != NA)
02518             {
02519               struct CV_link link;
02520               struct RS_link *olink, *olink_next;
02521 
02522               if (rs->spec_mode)
02523                 {
02524                   /* update the speculative create vector, future operations
02525                      get value from later creator or architected reg file */
02526                   link = spec_create_vector[rs->onames[i]];
02527                   if (/* !NULL */link.rs
02528                       && /* refs RS */(link.rs == rs && link.odep_num == i))
02529                     {
02530                       /* the result can now be read from a physical register,
02531                          indicate this as so */
02532                       spec_create_vector[rs->onames[i]] = CVLINK_NULL;
02533                       spec_create_vector_rt[rs->onames[i]] = sim_cycle;
02534                     }
02535                   /* else, creator invalidated or there is another creator */
02536                 }
02537               else
02538                 {
02539                   /* update the non-speculative create vector, future
02540                      operations get value from later creator or architected
02541                      reg file */
02542                   link = create_vector[rs->onames[i]];
02543                   if (/* !NULL */link.rs
02544                       && /* refs RS */(link.rs == rs && link.odep_num == i))
02545                     {
02546                       /* the result can now be read from a physical register,
02547                          indicate this as so */
02548                       create_vector[rs->onames[i]] = CVLINK_NULL;
02549                       create_vector_rt[rs->onames[i]] = sim_cycle;
02550                     }
02551                   /* else, creator invalidated or there is another creator */
02552                 }
02553 
02554               /* walk output list, queue up ready operations */
02555               for (olink=rs->odep_list[i]; olink; olink=olink_next)
02556                 {
02557                   if (RSLINK_VALID(olink))
02558                     {
02559                       if (olink->rs->idep_ready[olink->x.opnum])
02560                         panic("output dependence already satisfied");
02561 
02562                       /* input is now ready */
02563                       olink->rs->idep_ready[olink->x.opnum] = TRUE;
02564 
02565                       /* are all the register operands of target ready? */
02566                       if (OPERANDS_READY(olink->rs))
02567                         {
02568                           /* yes! enqueue instruction as ready, NOTE: stores
02569                              complete at dispatch, so no need to enqueue
02570                              them */
02571                           if (!olink->rs->in_LSQ
02572                               || ((MD_OP_FLAGS(olink->rs->op)&(F_MEM|F_STORE))
02573                                   == (F_MEM|F_STORE)))
02574                             readyq_enqueue(olink->rs);
02575                           /* else, ld op, issued when no mem conflict */
02576                         }
02577                     }
02578 
02579                   /* grab link to next element prior to free */
02580                   olink_next = olink->next;
02581 
02582                   /* free dependence link element */
02583                   RSLINK_FREE(olink);
02584                 }
02585               /* blow away the consuming op list */
02586               rs->odep_list[i] = NULL;
02587 
02588             } /* if not NA output */
02589 
02590         } /* for all outputs */
02591 
02592    } /* for all writeback events */
02593 
02594 }
02595 
02596 
02597 /*
02598  *  LSQ_REFRESH() - memory access dependence checker/scheduler
02599  */
02600 
02601 /* this function locates ready instructions whose memory dependencies have
02602    been satisfied, this is accomplished by walking the LSQ for loads, looking
02603    for blocking memory dependency condition (e.g., earlier store with an
02604    unknown address) */
02605 #define MAX_STD_UNKNOWNS                64
02606 static void
02607 lsq_refresh(void)
02608 {
02609   int i, j, index, n_std_unknowns;
02610   md_addr_t std_unknowns[MAX_STD_UNKNOWNS];
02611 
02612   /* scan entire queue for ready loads: scan from oldest instruction
02613      (head) until we reach the tail or an unresolved store, after which no
02614      other instruction will become ready */
02615   for (i=0, index=LSQ_head, n_std_unknowns=0;
02616        i < LSQ_num;
02617        i++, index=(index + 1) % LSQ_size)
02618     {
02619       /* terminate search for ready loads after first unresolved store,
02620          as no later load could be resolved in its presence */
02621       if (/* store? */
02622           (MD_OP_FLAGS(LSQ[index].op) & (F_MEM|F_STORE)) == (F_MEM|F_STORE))
02623         {
02624           if (!STORE_ADDR_READY(&LSQ[index]))
02625             {
02626               /* FIXME: a later STD + STD known could hide the STA unknown */
02627               /* sta unknown, blocks all later loads, stop search */
02628               break;
02629             }
02630           else if (!OPERANDS_READY(&LSQ[index]))
02631             {
02632               /* sta known, but std unknown, may block a later store, record
02633                  this address for later referral, we use an array here because
02634                  for most simulations the number of entries to search will be
02635                  very small */
02636               if (n_std_unknowns == MAX_STD_UNKNOWNS)
02637                 fatal("STD unknown array overflow, increase MAX_STD_UNKNOWNS");
02638               std_unknowns[n_std_unknowns++] = LSQ[index].addr;
02639             }
02640           else /* STORE_ADDR_READY() && OPERANDS_READY() */
02641             {
02642               /* a later STD known hides an earlier STD unknown */
02643               for (j=0; j<n_std_unknowns; j++)
02644                 {
02645                   if (std_unknowns[j] == /* STA/STD known */LSQ[index].addr)
02646                     std_unknowns[j] = /* bogus addr */0;
02647                 }
02648             }
02649         }
02650 
02651       if (/* load? */
02652           ((MD_OP_FLAGS(LSQ[index].op) & (F_MEM|F_LOAD)) == (F_MEM|F_LOAD))
02653           && /* queued? */!LSQ[index].queued
02654           && /* waiting? */!LSQ[index].issued
02655           && /* completed? */!LSQ[index].completed
02656           && /* regs ready? */OPERANDS_READY(&LSQ[index]))
02657         {
02658           /* no STA unknown conflict (because we got to this check), check for
02659              a STD unknown conflict */
02660           for (j=0; j<n_std_unknowns; j++)
02661             {
02662               /* found a relevant STD unknown? */
02663               if (std_unknowns[j] == LSQ[index].addr)
02664                 break;
02665             }
02666           if (j == n_std_unknowns)
02667             {
02668               /* no STA or STD unknown conflicts, put load on ready queue */
02669               readyq_enqueue(&LSQ[index]);
02670             }
02671         }
02672     }
02673 }
02674 
02675 
02676 /*
02677  *  RUU_ISSUE() - issue instructions to functional units
02678  */
02679 
02680 /* attempt to issue all operations in the ready queue; insts in the ready
02681    instruction queue have all register dependencies satisfied, this function
02682    must then 1) ensure the instructions memory dependencies have been satisfied
02683    (see lsq_refresh() for details on this process) and 2) a function unit
02684    is available in this cycle to commence execution of the operation; if all
02685    goes well, the function unit is allocated, a writeback event is scheduled,
02686    and the instruction begins execution */
02687 static void
02688 ruu_issue(void)
02689 {
02690   int i, load_lat, tlb_lat, n_issued;
02691   struct RS_link *node, *next_node;
02692   struct res_template *fu;
02693 
02694   /* FIXME: could be a little more efficient when scanning the ready queue */
02695 
02696   /* copy and then blow away the ready list, NOTE: the ready list is
02697      always totally reclaimed each cycle, and instructions that are not
02698      issue are explicitly reinserted into the ready instruction queue,
02699      this management strategy ensures that the ready instruction queue
02700      is always properly sorted */
02701   node = ready_queue;
02702   ready_queue = NULL;
02703 
02704   /* visit all ready instructions (i.e., insts whose register input
02705      dependencies have been satisfied, stop issue when no more instructions
02706      are available or issue bandwidth is exhausted */
02707   for (n_issued=0;
02708        node && n_issued < ruu_issue_width;
02709        node = next_node)
02710     {
02711       next_node = node->next;
02712 
02713       /* still valid? */
02714       if (RSLINK_VALID(node))
02715         {
02716           struct RUU_station *rs = RSLINK_RS(node);
02717 
02718           /* issue operation, both reg and mem deps have been satisfied */
02719           if (!OPERANDS_READY(rs) || !rs->queued
02720               || rs->issued || rs->completed)
02721             panic("issued inst !ready, issued, or completed");
02722 
02723           /* node is now un-queued */
02724           rs->queued = FALSE;
02725 
02726           if (rs->in_LSQ
02727               && ((MD_OP_FLAGS(rs->op) & (F_MEM|F_STORE)) == (F_MEM|F_STORE)))
02728             {
02729               /* stores complete in effectively zero time, result is
02730                  written into the load/store queue, the actual store into
02731                  the memory system occurs when the instruction is retired
02732                  (see ruu_commit()) */
02733               rs->issued = TRUE;
02734               rs->completed = TRUE;
02735               if (rs->onames[0] || rs->onames[1])
02736                 panic("store creates result");
02737 
02738               if (rs->recover_inst)
02739                 panic("mis-predicted store");
02740 
02741               /* entered execute stage, indicate in pipe trace */
02742               ptrace_newstage(rs->ptrace_seq, PST_WRITEBACK, 0);
02743 
02744               /* one more inst issued */
02745               n_issued++;
02746             }
02747           else
02748             {
02749               /* issue the instruction to a functional unit */
02750               if (MD_OP_FUCLASS(rs->op) != NA)
02751                 {
02752                   fu = res_get(fu_pool, MD_OP_FUCLASS(rs->op));
02753                   if (fu)
02754                     {
02755                       /* got one! issue inst to functional unit */
02756                       rs->issued = TRUE;
02757                       /* reserve the functional unit */
02758                       if (fu->master->busy)
02759                         panic("functional unit already in use");
02760 
02761                       /* schedule functional unit release event */
02762                       fu->master->busy = fu->issuelat;
02763 
02764                       /* schedule a result writeback event */
02765                       if (rs->in_LSQ
02766                           && ((MD_OP_FLAGS(rs->op) & (F_MEM|F_LOAD))
02767                               == (F_MEM|F_LOAD)))
02768                         {
02769                           int events = 0;
02770 
02771                           /* for loads, determine cache access latency:
02772                              first scan LSQ to see if a store forward is
02773                              possible, if not, access the data cache */
02774                           load_lat = 0;
02775                           i = (rs - LSQ);
02776                           if (i != LSQ_head)
02777                             {
02778                               for (;;)
02779                                 {
02780                                   /* go to next earlier LSQ entry */
02781                                   i = (i + (LSQ_size-1)) % LSQ_size;
02782 
02783                                   /* FIXME: not dealing with partials! */
02784                                   if ((MD_OP_FLAGS(LSQ[i].op) & F_STORE)
02785                                       && (LSQ[i].addr == rs->addr))
02786                                     {
02787                                       /* hit in the LSQ */
02788                                       load_lat = 1;
02789                                       break;
02790                                     }
02791 
02792                                   /* scan finished? */
02793                                   if (i == LSQ_head)
02794                                     break;
02795                                 }
02796                             }
02797 
02798                           /* was the value store forwared from the LSQ? */
02799                           if (!load_lat)
02800                             {
02801                               int valid_addr = MD_VALID_ADDR(rs->addr);
02802 
02803                               if (!spec_mode && !valid_addr)
02804                                 sim_invalid_addrs++;
02805 
02806                               /* no! go to the data cache if addr is valid */
02807                               if (cache_dl1 && valid_addr)
02808                                 {
02809                                   /* access the cache if non-faulting */
02810                                   load_lat =
02811                                     cache_access(cache_dl1, Read,
02812                                                  (rs->addr & ~3), NULL, 4,
02813                                                  sim_cycle, NULL, NULL);
02814                                   if (load_lat > cache_dl1_lat)
02815                                     events |= PEV_CACHEMISS;
02816                                 }
02817                               else
02818                                 {
02819                                   /* no caches defined, just use op latency */
02820                                   load_lat = fu->oplat;
02821                                 }
02822                             }
02823 
02824                           /* all loads and stores must to access D-TLB */
02825                           if (dtlb && MD_VALID_ADDR(rs->addr))
02826                             {
02827                               /* access the D-DLB, NOTE: this code will
02828                                  initiate speculative TLB misses */
02829                               tlb_lat =
02830                                 cache_access(dtlb, Read, (rs->addr & ~3),
02831                                              NULL, 4, sim_cycle, NULL, NULL);
02832                               if (tlb_lat > 1)
02833                                 events |= PEV_TLBMISS;
02834 
02835                               /* D-cache/D-TLB accesses occur in parallel */
02836                               load_lat = MAX(tlb_lat, load_lat);
02837                             }
02838 
02839                           /* use computed cache access latency */
02840                           eventq_queue_event(rs, sim_cycle + load_lat);
02841 
02842                           /* entered execute stage, indicate in pipe trace */
02843                           ptrace_newstage(rs->ptrace_seq, PST_EXECUTE,
02844                                           ((rs->ea_comp ? PEV_AGEN : 0)
02845                                            | events));
02846                         }
02847                       else /* !load && !store */
02848                         {
02849                           /* use deterministic functional unit latency */
02850                           eventq_queue_event(rs, sim_cycle + fu->oplat);
02851 
02852                           /* entered execute stage, indicate in pipe trace */
02853                           ptrace_newstage(rs->ptrace_seq, PST_EXECUTE, 
02854                                           rs->ea_comp ? PEV_AGEN : 0);
02855                         }
02856 
02857                       /* one more inst issued */
02858                       n_issued++;
02859                     }
02860                   else /* no functional unit */
02861                     {
02862                       /* insufficient functional unit resources, put operation
02863                          back onto the ready list, we'll try to issue it
02864                          again next cycle */
02865                       readyq_enqueue(rs);
02866                     }
02867                 }
02868               else /* does not require a functional unit! */
02869                 {
02870                   /* FIXME: need better solution for these */
02871                   /* the instruction does not need a functional unit */
02872                   rs->issued = TRUE;
02873 
02874                   /* schedule a result event */
02875                   eventq_queue_event(rs, sim_cycle + 1);
02876 
02877                   /* entered execute stage, indicate in pipe trace */
02878                   ptrace_newstage(rs->ptrace_seq, PST_EXECUTE,
02879                                   rs->ea_comp ? PEV_AGEN : 0);
02880 
02881                   /* one more inst issued */
02882                   n_issued++;
02883                 }
02884             } /* !store */
02885 
02886         }
02887       /* else, RUU entry was squashed */
02888 
02889       /* reclaim ready list entry, NOTE: this is done whether or not the
02890          instruction issued, since the instruction was once again reinserted
02891          into the ready queue if it did not issue, this ensures that the ready
02892          queue is always properly sorted */
02893       RSLINK_FREE(node);
02894     }
02895 
02896   /* put any instruction not issued back into the ready queue, go through
02897      normal channels to ensure instruction stay ordered correctly */
02898   for (; node; node = next_node)
02899     {
02900       next_node = node->next;
02901 
02902       /* still valid? */
02903       if (RSLINK_VALID(node))
02904         {
02905           struct RUU_station *rs = RSLINK_RS(node);
02906 
02907           /* node is now un-queued */
02908           rs->queued = FALSE;
02909 
02910           /* not issued, put operation back onto the ready list, we'll try to
02911              issue it again next cycle */
02912           readyq_enqueue(rs);
02913         }
02914       /* else, RUU entry was squashed */
02915 
02916       /* reclaim ready list entry, NOTE: this is done whether or not the
02917          instruction issued, since the instruction was once again reinserted
02918          into the ready queue if it did not issue, this ensures that the ready
02919          queue is always properly sorted */
02920       RSLINK_FREE(node);
02921     }
02922 }
02923 
02924 
02925 /*
02926  * routines for generating on-the-fly instruction traces with support
02927  * for control and data misspeculation modeling
02928  */
02929 
02930 /* integer register file */
02931 #define R_BMAP_SZ       (BITMAP_SIZE(MD_NUM_IREGS))
02932 static BITMAP_TYPE(MD_NUM_IREGS, use_spec_R);
02933 static md_gpr_t spec_regs_R;
02934 
02935 /* floating point register file */
02936 #define F_BMAP_SZ       (BITMAP_SIZE(MD_NUM_FREGS))
02937 static BITMAP_TYPE(MD_NUM_FREGS, use_spec_F);
02938 static md_fpr_t spec_regs_F;
02939 
02940 /* miscellaneous registers */
02941 #define C_BMAP_SZ       (BITMAP_SIZE(MD_NUM_CREGS))
02942 static BITMAP_TYPE(MD_NUM_FREGS, use_spec_C);
02943 static md_ctrl_t spec_regs_C;
02944 
02945 /* dump speculative register state */
02946 static void
02947 rspec_dump(FILE *stream)                        /* output stream */
02948 {
02949   int i;
02950 
02951   if (!stream)
02952     stream = stderr;
02953 
02954   fprintf(stream, "** speculative register contents **\n");
02955 
02956   fprintf(stream, "spec_mode: %s\n", spec_mode ? "t" : "f");
02957 
02958   /* dump speculative integer regs */
02959   for (i=0; i < MD_NUM_IREGS; i++)
02960     {
02961       if (BITMAP_SET_P(use_spec_R, R_BMAP_SZ, i))
02962         {
02963           md_print_ireg(spec_regs_R, i, stream);
02964           fprintf(stream, "\n");
02965         }
02966     }
02967 
02968   /* dump speculative FP regs */
02969   for (i=0; i < MD_NUM_FREGS; i++)
02970     {
02971       if (BITMAP_SET_P(use_spec_F, F_BMAP_SZ, i))
02972         {
02973           md_print_fpreg(spec_regs_F, i, stream);
02974           fprintf(stream, "\n");
02975         }
02976     }
02977 
02978   /* dump speculative CTRL regs */
02979   for (i=0; i < MD_NUM_CREGS; i++)
02980     {
02981       if (BITMAP_SET_P(use_spec_C, C_BMAP_SZ, i))
02982         {
02983           md_print_creg(spec_regs_C, i, stream);
02984           fprintf(stream, "\n");
02985         }
02986     }
02987 }
02988 
02989 
02990 /* speculative memory hash table size, NOTE: this must be a power-of-two */
02991 #define STORE_HASH_SIZE         32
02992 
02993 /* speculative memory hash table definition, accesses go through this hash
02994    table when accessing memory in speculative mode, the hash table flush the
02995    table when recovering from mispredicted branches */
02996 struct spec_mem_ent {
02997   struct spec_mem_ent *next;            /* ptr to next hash table bucket */
02998   md_addr_t addr;                       /* virtual address of spec state */
02999   unsigned int data[2];                 /* spec buffer, up to 8 bytes */
03000 };
03001 
03002 /* speculative memory hash table */
03003 static struct spec_mem_ent *store_htable[STORE_HASH_SIZE];
03004 
03005 /* speculative memory hash table bucket free list */
03006 static struct spec_mem_ent *bucket_free_list = NULL;
03007 
03008 
03009 /* program counter */
03010 static md_addr_t pred_PC;
03011 static md_addr_t recover_PC;
03012 
03013 /* fetch unit next fetch address */
03014 static md_addr_t fetch_regs_PC;
03015 static md_addr_t fetch_pred_PC;
03016 
03017 /* IFETCH -> DISPATCH instruction queue definition */
03018 struct fetch_rec {
03019   md_inst_t IR;                         /* inst register */
03020   md_addr_t regs_PC, pred_PC;           /* current PC, predicted next PC */
03021   struct bpred_update_t dir_update;     /* bpred direction update info */
03022   int stack_recover_idx;                /* branch predictor RSB index */
03023   unsigned int ptrace_seq;              /* print trace sequence id */
03024 };
03025 static struct fetch_rec *fetch_data;    /* IFETCH -> DISPATCH inst queue */
03026 static int fetch_num;                   /* num entries in IF -> DIS queue */
03027 static int fetch_tail, fetch_head;      /* head and tail pointers of queue */
03028 
03029 /* recover instruction trace generator state to precise state state immediately
03030    before the first mis-predicted branch; this is accomplished by resetting
03031    all register value copied-on-write bitmasks are reset, and the speculative
03032    memory hash table is cleared */
03033 static void
03034 tracer_recover(void)
03035 {
03036   int i;
03037   struct spec_mem_ent *ent, *ent_next;
03038 
03039   /* better be in mis-speculative trace generation mode */
03040   if (!spec_mode)
03041     panic("cannot recover unless in speculative mode");
03042 
03043   /* reset to non-speculative trace generation mode */
03044   spec_mode = FALSE;
03045 
03046   /* reset copied-on-write register bitmasks back to non-speculative state */
03047   BITMAP_CLEAR_MAP(use_spec_R, R_BMAP_SZ);
03048   BITMAP_CLEAR_MAP(use_spec_F, F_BMAP_SZ);
03049   BITMAP_CLEAR_MAP(use_spec_C, C_BMAP_SZ);
03050 
03051   /* reset memory state back to non-speculative state */
03052   /* FIXME: could version stamps be used here?!?!? */
03053   for (i=0; i<STORE_HASH_SIZE; i++)
03054     {
03055       /* release all hash table buckets */
03056       for (ent=store_htable[i]; ent; ent=ent_next)
03057         {
03058           ent_next = ent->next;
03059           ent->next = bucket_free_list;
03060           bucket_free_list = ent;
03061         }
03062       store_htable[i] = NULL;
03063     }
03064 
03065   /* if pipetracing, indicate squash of instructions in the inst fetch queue */
03066   if (ptrace_active)
03067     {
03068       while (fetch_num != 0)
03069         {
03070           /* squash the next instruction from the IFETCH -> DISPATCH queue */
03071           ptrace_endinst(fetch_data[fetch_head].ptrace_seq);
03072 
03073           /* consume instruction from IFETCH -> DISPATCH queue */
03074           fetch_head = (fetch_head+1) & (ruu_ifq_size - 1);
03075           fetch_num--;
03076         }
03077     }
03078 
03079   /* reset IFETCH state */
03080   fetch_num = 0;
03081   fetch_tail = fetch_head = 0;
03082   fetch_pred_PC = fetch_regs_PC = recover_PC;
03083 }
03084 
03085 /* initialize the speculative instruction state generator state */
03086 static void
03087 tracer_init(void)
03088 {
03089   int i;
03090 
03091   /* initially in non-speculative mode */
03092   spec_mode = FALSE;
03093 
03094   /* register state is from non-speculative state buffers */
03095   BITMAP_CLEAR_MAP(use_spec_R, R_BMAP_SZ);
03096   BITMAP_CLEAR_MAP(use_spec_F, F_BMAP_SZ);
03097   BITMAP_CLEAR_MAP(use_spec_C, C_BMAP_SZ);
03098 
03099   /* memory state is from non-speculative memory pages */
03100   for (i=0; i<STORE_HASH_SIZE; i++)
03101     store_htable[i] = NULL;
03102 }
03103 
03104 
03105 /* speculative memory hash table address hash function */
03106 #define HASH_ADDR(ADDR)                                                 \
03107   ((((ADDR) >> 24)^((ADDR) >> 16)^((ADDR) >> 8)^(ADDR)) & (STORE_HASH_SIZE-1))
03108 
03109 /* this functional provides a layer of mis-speculated state over the
03110    non-speculative memory state, when in mis-speculation trace generation mode,
03111    the simulator will call this function to access memory, instead of the
03112    non-speculative memory access interfaces defined in memory.h; when storage
03113    is written, an entry is allocated in the speculative memory hash table,
03114    future reads and writes while in mis-speculative trace generation mode will
03115    access this buffer instead of non-speculative memory state; when the trace
03116    generator transitions back to non-speculative trace generation mode,
03117    tracer_recover() clears this table, returns any access fault */
03118 static enum md_fault_type
03119 spec_mem_access(struct mem_t *mem,              /* memory space to access */
03120                 enum mem_cmd cmd,               /* Read or Write access cmd */
03121                 md_addr_t addr,                 /* virtual address of access */
03122                 void *p,                        /* input/output buffer */
03123                 int nbytes)                     /* number of bytes to access */
03124 {
03125   int i, index;
03126   struct spec_mem_ent *ent, *prev;
03127 
03128   /* FIXME: partially overlapping writes are not combined... */
03129   /* FIXME: partially overlapping reads are not handled correctly... */
03130 
03131   /* check alignments, even speculative this test should always pass */
03132   if ((nbytes & (nbytes-1)) != 0 || (addr & (nbytes-1)) != 0)
03133     {
03134       /* no can do, return zero result */
03135       for (i=0; i < nbytes; i++)
03136         ((char *)p)[i] = 0;
03137 
03138       return md_fault_none;
03139     }
03140 
03141   /* check permissions */
03142   if (!((addr >= ld_text_base && addr < (ld_text_base+ld_text_size)
03143          && cmd == Read)
03144         || MD_VALID_ADDR(addr)))
03145     {
03146       /* no can do, return zero result */
03147       for (i=0; i < nbytes; i++)
03148         ((char *)p)[i] = 0;
03149 
03150       return md_fault_none;
03151     }
03152 
03153   /* has this memory state been copied on mis-speculative write? */
03154   index = HASH_ADDR(addr);
03155   for (prev=NULL,ent=store_htable[index]; ent; prev=ent,ent=ent->next)
03156     {
03157       if (ent->addr == addr)
03158         {
03159           /* reorder chains to speed access into hash table */
03160           if (prev != NULL)
03161             {
03162               /* not at head of list, relink the hash table entry at front */
03163               prev->next = ent->next;
03164               ent->next = store_htable[index];
03165               store_htable[index] = ent;
03166             }
03167           break;
03168         }
03169     }
03170 
03171   /* no, if it is a write, allocate a hash table entry to hold the data */
03172   if (!ent && cmd == Write)
03173     {
03174       /* try to get an entry from the free list, if available */
03175       if (!bucket_free_list)
03176         {
03177           /* otherwise, call calloc() to get the needed storage */
03178           bucket_free_list = calloc(1, sizeof(struct spec_mem_ent));
03179           if (!bucket_free_list)
03180             fatal("out of virtual memory");
03181         }
03182       ent = bucket_free_list;
03183       bucket_free_list = bucket_free_list->next;
03184 
03185       if (!bugcompat_mode)
03186         {
03187           /* insert into hash table */
03188           ent->next = store_htable[index];
03189           store_htable[index] = ent;
03190           ent->addr = addr;
03191           ent->data[0] = 0; ent->data[1] = 0;
03192         }
03193     }
03194 
03195   /* handle the read or write to speculative or non-speculative storage */
03196   switch (nbytes)
03197     {
03198     case 1:
03199       if (cmd == Read)
03200         {
03201           if (ent)
03202             {
03203               /* read from mis-speculated state buffer */
03204               *((byte_t *)p) = *((byte_t *)(&ent->data[0]));
03205             }
03206           else
03207             {
03208               /* read from non-speculative memory state, don't allocate
03209                  memory pages with speculative loads */
03210               *((byte_t *)p) = MEM_READ_BYTE(mem, addr);
03211             }
03212         }
03213       else
03214         {
03215           /* always write into mis-speculated state buffer */
03216           *((byte_t *)(&ent->data[0])) = *((byte_t *)p);
03217         }
03218       break;
03219     case 2:
03220       if (cmd == Read)
03221         {
03222           if (ent)
03223             {
03224               /* read from mis-speculated state buffer */
03225               *((half_t *)p) = *((half_t *)(&ent->data[0]));
03226             }
03227           else
03228             {
03229               /* read from non-speculative memory state, don't allocate
03230                  memory pages with speculative loads */
03231               *((half_t *)p) = MEM_READ_HALF(mem, addr);
03232             }
03233         }
03234       else
03235         {
03236           /* always write into mis-speculated state buffer */
03237           *((half_t *)&ent->data[0]) = *((half_t *)p);
03238         }
03239       break;
03240     case 4:
03241       if (cmd == Read)
03242         {
03243           if (ent)
03244             {
03245               /* read from mis-speculated state buffer */
03246               *((word_t *)p) = *((word_t *)&ent->data[0]);
03247             }
03248           else
03249             {
03250               /* read from non-speculative memory state, don't allocate
03251                  memory pages with speculative loads */
03252               *((word_t *)p) = MEM_READ_WORD(mem, addr);
03253             }
03254         }
03255       else
03256         {
03257           /* always write into mis-speculated state buffer */
03258           *((word_t *)&ent->data[0]) = *((word_t *)p);
03259         }
03260       break;
03261     case 8:
03262       if (cmd == Read)
03263         {
03264           if (ent)
03265             {
03266               /* read from mis-speculated state buffer */
03267               *((word_t *)p) = *((word_t *)&ent->data[0]);
03268               *(((word_t *)p)+1) = *((word_t *)&ent->data[1]);
03269             }
03270           else
03271             {
03272               /* read from non-speculative memory state, don't allocate
03273                  memory pages with speculative loads */
03274               *((word_t *)p) = MEM_READ_WORD(mem, addr);
03275               *(((word_t *)p)+1) =
03276                 MEM_READ_WORD(mem, addr + sizeof(word_t));
03277             }
03278         }
03279       else
03280         {
03281           /* always write into mis-speculated state buffer */
03282           *((word_t *)&ent->data[0]) = *((word_t *)p);
03283           *((word_t *)&ent->data[1]) = *(((word_t *)p)+1);
03284         }
03285       break;
03286     default:
03287       panic("access size not supported in mis-speculative mode");
03288     }
03289 
03290   return md_fault_none;
03291 }
03292 
03293 /* dump speculative memory state */
03294 static void
03295 mspec_dump(FILE *stream)                        /* output stream */
03296 {
03297   int i;
03298   struct spec_mem_ent *ent;
03299 
03300   if (!stream)
03301     stream = stderr;
03302 
03303   fprintf(stream, "** speculative memory contents **\n");
03304 
03305   fprintf(stream, "spec_mode: %s\n", spec_mode ? "t" : "f");
03306 
03307   for (i=0; i<STORE_HASH_SIZE; i++)
03308     {
03309       /* dump contents of all hash table buckets */
03310       for (ent=store_htable[i]; ent; ent=ent->next)
03311         {
03312           myfprintf(stream, "[0x%08p]: %12.0f/0x%08x:%08x\n",
03313                     ent->addr, (double)(*((double *)ent->data)),
03314                     *((unsigned int *)&ent->data[0]),
03315                     *(((unsigned int *)&ent->data[0]) + 1));
03316         }
03317     }
03318 }
03319 
03320 /* default memory state accessor, used by DLite */
03321 static char *                                   /* err str, NULL for no err */
03322 simoo_mem_obj(struct mem_t *mem,                /* memory space to access */
03323               int is_write,                     /* access type */
03324               md_addr_t addr,                   /* address to access */
03325               char *p,                          /* input/output buffer */
03326               int nbytes)                       /* size of access */
03327 {
03328   enum mem_cmd cmd;
03329 
03330   if (!is_write)
03331     cmd = Read;
03332   else
03333     cmd = Write;
03334 
03335 #if 0
03336   char *errstr;
03337 
03338   errstr = mem_valid(cmd, addr, nbytes, /* declare */FALSE);
03339   if (errstr)
03340     return errstr;
03341 #endif
03342 
03343   /* else, no error, access memory */
03344   if (spec_mode)
03345     spec_mem_access(mem, cmd, addr, p, nbytes);
03346   else
03347     mem_access(mem, cmd, addr, p, nbytes);
03348 
03349   /* no error */
03350   return NULL;
03351 }
03352 
03353 
03354 /*
03355  *  RUU_DISPATCH() - decode instructions and allocate RUU and LSQ resources
03356  */
03357 
03358 /* link RS onto the output chain number of whichever operation will next
03359    create the architected register value IDEP_NAME */
03360 static INLINE void
03361 ruu_link_idep(struct RUU_station *rs,           /* rs station to link */
03362               int idep_num,                     /* input dependence number */
03363               int idep_name)                    /* input register name */
03364 {
03365   struct CV_link head;
03366   struct RS_link *link;
03367 
03368   /* any dependence? */
03369   if (idep_name == NA)
03370     {
03371       /* no input dependence for this input slot, mark operand as ready */
03372       rs->idep_ready[idep_num] = TRUE;
03373       return;
03374     }
03375 
03376   /* locate creator of operand */
03377   head = CREATE_VECTOR(idep_name);
03378 
03379   /* any creator? */
03380   if (!head.rs)
03381     {
03382       /* no active creator, use value available in architected reg file,
03383          indicate the operand is ready for use */
03384       rs->idep_ready[idep_num] = TRUE;
03385       return;
03386     }
03387   /* else, creator operation will make this value sometime in the future */
03388 
03389   /* indicate value will be created sometime in the future, i.e., operand
03390      is not yet ready for use */
03391   rs->idep_ready[idep_num] = FALSE;
03392 
03393   /* link onto creator's output list of dependant operand */
03394   RSLINK_NEW(link, rs); link->x.opnum = idep_num;
03395   link->next = head.rs->odep_list[head.odep_num];
03396   head.rs->odep_list[head.odep_num] = link;
03397 }
03398 
03399 /* make RS the creator of architected register ODEP_NAME */
03400 static INLINE void
03401 ruu_install_odep(struct RUU_station *rs,        /* creating RUU station */
03402                  int odep_num,                  /* output operand number */
03403                  int odep_name)                 /* output register name */
03404 {
03405   struct CV_link cv;
03406 
03407   /* any dependence? */
03408   if (odep_name == NA)
03409     {
03410       /* no value created */
03411       rs->onames[odep_num] = NA;
03412       return;
03413     }
03414   /* else, create a RS_NULL terminated output chain in create vector */
03415 
03416   /* record output name, used to update create vector at completion */
03417   rs->onames[odep_num] = odep_name;
03418 
03419   /* initialize output chain to empty list */
03420   rs->odep_list[odep_num] = NULL;
03421 
03422   /* indicate this operation is latest creator of ODEP_NAME */
03423   CVLINK_INIT(cv, rs, odep_num);
03424   SET_CREATE_VECTOR(odep_name, cv);
03425 }
03426 
03427 
03428 /*
03429  * configure the instruction decode engine
03430  */
03431 
03432 #define DNA                     (0)
03433 
03434 #if defined(TARGET_PISA)
03435 
03436 /* general register dependence decoders */
03437 #define DGPR(N)                 (N)
03438 #define DGPR_D(N)               ((N) &~1)
03439 
03440 /* floating point register dependence decoders */
03441 #define DFPR_L(N)               (((N)+32)&~1)
03442 #define DFPR_F(N)               (((N)+32)&~1)
03443 #define DFPR_D(N)               (((N)+32)&~1)
03444 
03445 /* miscellaneous register dependence decoders */
03446 #define DHI                     (0+32+32)
03447 #define DLO                     (1+32+32)
03448 #define DFCC                    (2+32+32)
03449 #define DTMP                    (3+32+32)
03450 
03451 #elif defined(TARGET_ALPHA)
03452 
03453 /* general register dependence decoders, $r31 maps to DNA (0) */
03454 #define DGPR(N)                 (31 - (N)) /* was: (((N) == 31) ? DNA : (N)) */
03455 
03456 /* floating point register dependence decoders */
03457 #define DFPR(N)                 (((N) == 31) ? DNA : ((N)+32))
03458 
03459 /* miscellaneous register dependence decoders */
03460 #define DFPCR                   (0+32+32)
03461 #define DUNIQ                   (1+32+32)
03462 #define DTMP                    (2+32+32)
03463 
03464 #else
03465 #error No ISA target defined...
03466 #endif
03467 
03468 
03469 /*
03470  * configure the execution engine
03471  */
03472 
03473 /* next program counter */
03474 #define SET_NPC(EXPR)           (regs.regs_NPC = (EXPR))
03475 
03476 /* target program counter */
03477 #undef  SET_TPC
03478 #define SET_TPC(EXPR)           (target_PC = (EXPR))
03479 
03480 /* current program counter */
03481 #define CPC                     (regs.regs_PC)
03482 #define SET_CPC(EXPR)           (regs.regs_PC = (EXPR))
03483 
03484 /* general purpose register accessors, NOTE: speculative copy on write storage
03485    provided for fast recovery during wrong path execute (see tracer_recover()
03486    for details on this process */
03487 #define GPR(N)                  (BITMAP_SET_P(use_spec_R, R_BMAP_SZ, (N))\
03488                                  ? spec_regs_R[N]                       \
03489                                  : regs.regs_R[N])
03490 #define SET_GPR(N,EXPR)         (spec_mode                              \
03491                                  ? ((spec_regs_R[N] = (EXPR)),          \
03492                                     BITMAP_SET(use_spec_R, R_BMAP_SZ, (N)),\
03493                                     spec_regs_R[N])                     \
03494                                  : (regs.regs_R[N] = (EXPR)))
03495 
03496 #if defined(TARGET_PISA)
03497 
03498 /* floating point register accessors, NOTE: speculative copy on write storage
03499    provided for fast recovery during wrong path execute (see tracer_recover()
03500    for details on this process */
03501 #define FPR_L(N)                (BITMAP_SET_P(use_spec_F, F_BMAP_SZ, ((N)&~1))\
03502                                  ? spec_regs_F.l[(N)]                   \
03503                                  : regs.regs_F.l[(N)])
03504 #define SET_FPR_L(N,EXPR)       (spec_mode                              \
03505                                  ? ((spec_regs_F.l[(N)] = (EXPR)),      \
03506                                     BITMAP_SET(use_spec_F,F_BMAP_SZ,((N)&~1)),\
03507                                     spec_regs_F.l[(N)])                 \
03508                                  : (regs.regs_F.l[(N)] = (EXPR)))
03509 #define FPR_F(N)                (BITMAP_SET_P(use_spec_F, F_BMAP_SZ, ((N)&~1))\
03510                                  ? spec_regs_F.f[(N)]                   \
03511                                  : regs.regs_F.f[(N)])
03512 #define SET_FPR_F(N,EXPR)       (spec_mode                              \
03513                                  ? ((spec_regs_F.f[(N)] = (EXPR)),      \
03514                                     BITMAP_SET(use_spec_F,F_BMAP_SZ,((N)&~1)),\
03515                                     spec_regs_F.f[(N)])                 \
03516                                  : (regs.regs_F.f[(N)] = (EXPR)))
03517 #define FPR_D(N)                (BITMAP_SET_P(use_spec_F, F_BMAP_SZ, ((N)&~1))\
03518                                  ? spec_regs_F.d[(N) >> 1]              \
03519                                  : regs.regs_F.d[(N) >> 1])
03520 #define SET_FPR_D(N,EXPR)       (spec_mode                              \
03521                                  ? ((spec_regs_F.d[(N) >> 1] = (EXPR)), \
03522                                     BITMAP_SET(use_spec_F,F_BMAP_SZ,((N)&~1)),\
03523                                     spec_regs_F.d[(N) >> 1])            \
03524                                  : (regs.regs_F.d[(N) >> 1] = (EXPR)))
03525 
03526 /* miscellanous register accessors, NOTE: speculative copy on write storage
03527    provided for fast recovery during wrong path execute (see tracer_recover()
03528    for details on this process */
03529 #define HI                      (BITMAP_SET_P(use_spec_C, C_BMAP_SZ, /*hi*/0)\
03530                                  ? spec_regs_C.hi                       \
03531                                  : regs.regs_C.hi)
03532 #define SET_HI(EXPR)            (spec_mode                              \
03533                                  ? ((spec_regs_C.hi = (EXPR)),          \
03534                                     BITMAP_SET(use_spec_C, C_BMAP_SZ,/*hi*/0),\
03535                                     spec_regs_C.hi)                     \
03536                                  : (regs.regs_C.hi = (EXPR)))
03537 #define LO                      (BITMAP_SET_P(use_spec_C, C_BMAP_SZ, /*lo*/1)\
03538                                  ? spec_regs_C.lo                       \
03539                                  : regs.regs_C.lo)
03540 #define SET_LO(EXPR)            (spec_mode                              \
03541                                  ? ((spec_regs_C.lo = (EXPR)),          \
03542                                     BITMAP_SET(use_spec_C, C_BMAP_SZ,/*lo*/1),\
03543                                     spec_regs_C.lo)                     \
03544                                  : (regs.regs_C.lo = (EXPR)))
03545 #define FCC                     (BITMAP_SET_P(use_spec_C, C_BMAP_SZ,/*fcc*/2)\
03546                                  ? spec_regs_C.fcc                      \
03547                                  : regs.regs_C.fcc)
03548 #define SET_FCC(EXPR)           (spec_mode                              \
03549                                  ? ((spec_regs_C.fcc = (EXPR)),         \
03550                                     BITMAP_SET(use_spec_C,C_BMAP_SZ,/*fcc*/2),\
03551                                     spec_regs_C.fcc)                    \
03552                                  : (regs.regs_C.fcc = (EXPR)))
03553 
03554 #elif defined(TARGET_ALPHA)
03555 
03556 /* floating point register accessors, NOTE: speculative copy on write storage
03557    provided for fast recovery during wrong path execute (see tracer_recover()
03558    for details on this process */
03559 #define FPR_Q(N)                (BITMAP_SET_P(use_spec_F, F_BMAP_SZ, (N))\
03560                                  ? spec_regs_F.q[(N)]                   \
03561                                  : regs.regs_F.q[(N)])
03562 #define SET_FPR_Q(N,EXPR)       (spec_mode                              \
03563                                  ? ((spec_regs_F.q[(N)] = (EXPR)),      \
03564                                     BITMAP_SET(use_spec_F,F_BMAP_SZ, (N)),\
03565                                     spec_regs_F.q[(N)])                 \
03566                                  : (regs.regs_F.q[(N)] = (EXPR)))
03567 #define FPR(N)                  (BITMAP_SET_P(use_spec_F, F_BMAP_SZ, (N))\
03568                                  ? spec_regs_F.d[(N)]                   \
03569                                  : regs.regs_F.d[(N)])
03570 #define SET_FPR(N,EXPR)         (spec_mode                              \
03571                                  ? ((spec_regs_F.d[(N)] = (EXPR)),      \
03572                                     BITMAP_SET(use_spec_F,F_BMAP_SZ, (N)),\
03573                                     spec_regs_F.d[(N)])                 \
03574                                  : (regs.regs_F.d[(N)] = (EXPR)))
03575 
03576 /* miscellanous register accessors, NOTE: speculative copy on write storage
03577    provided for fast recovery during wrong path execute (see tracer_recover()
03578    for details on this process */
03579 #define FPCR                    (BITMAP_SET_P(use_spec_C, C_BMAP_SZ,/*fpcr*/0)\
03580                                  ? spec_regs_C.fpcr                     \
03581                                  : regs.regs_C.fpcr)
03582 #define SET_FPCR(EXPR)          (spec_mode                              \
03583                                  ? ((spec_regs_C.fpcr = (EXPR)),        \
03584                                    BITMAP_SET(use_spec_C,C_BMAP_SZ,/*fpcr*/0),\
03585                                     spec_regs_C.fpcr)                   \
03586                                  : (regs.regs_C.fpcr = (EXPR)))
03587 #define UNIQ                    (BITMAP_SET_P(use_spec_C, C_BMAP_SZ,/*uniq*/1)\
03588                                  ? spec_regs_C.uniq                     \
03589                                  : regs.regs_C.uniq)
03590 #define SET_UNIQ(EXPR)          (spec_mode                              \
03591                                  ? ((spec_regs_C.uniq = (EXPR)),        \
03592                                    BITMAP_SET(use_spec_C,C_BMAP_SZ,/*uniq*/1),\
03593                                     spec_regs_C.uniq)                   \
03594                                  : (regs.regs_C.uniq = (EXPR)))
03595 #define FCC                     (BITMAP_SET_P(use_spec_C, C_BMAP_SZ,/*fcc*/2)\
03596                                  ? spec_regs_C.fcc                      \
03597                                  : regs.regs_C.fcc)
03598 #define SET_FCC(EXPR)           (spec_mode                              \
03599                                  ? ((spec_regs_C.fcc = (EXPR)),         \
03600                                     BITMAP_SET(use_spec_C,C_BMAP_SZ,/*fcc*/1),\
03601                                     spec_regs_C.fcc)                    \
03602                                  : (regs.regs_C.fcc = (EXPR)))
03603 
03604 #else
03605 #error No ISA target defined...
03606 #endif
03607 
03608 /* precise architected memory state accessor macros, NOTE: speculative copy on
03609    write storage provided for fast recovery during wrong path execute (see
03610    tracer_recover() for details on this process */
03611 #define __READ_SPECMEM(SRC, SRC_V, FAULT)                               \
03612   (addr = (SRC),                                                        \
03613    (spec_mode                                                           \
03614     ? ((FAULT) = spec_mem_access(mem, Read, addr, &SRC_V, sizeof(SRC_V)))\
03615     : ((FAULT) = mem_access(mem, Read, addr, &SRC_V, sizeof(SRC_V)))),  \
03616    SRC_V)
03617 
03618 #define READ_BYTE(SRC, FAULT)                                           \
03619   __READ_SPECMEM((SRC), temp_byte, (FAULT))
03620 #define READ_HALF(SRC, FAULT)                                           \
03621   MD_SWAPH(__READ_SPECMEM((SRC), temp_half, (FAULT)))
03622 #define READ_WORD(SRC, FAULT)                                           \
03623   MD_SWAPW(__READ_SPECMEM((SRC), temp_word, (FAULT)))
03624 #ifdef HOST_HAS_QWORD
03625 #define READ_QWORD(SRC, FAULT)                                          \
03626   MD_SWAPQ(__READ_SPECMEM((SRC), temp_qword, (FAULT)))
03627 #endif /* HOST_HAS_QWORD */
03628 
03629 
03630 #define __WRITE_SPECMEM(SRC, DST, DST_V, FAULT)                         \
03631   (DST_V = (SRC), addr = (DST),                                         \
03632    (spec_mode                                                           \
03633     ? ((FAULT) = spec_mem_access(mem, Write, addr, &DST_V, sizeof(DST_V)))\
03634     : ((FAULT) = mem_access(mem, Write, addr, &DST_V, sizeof(DST_V)))))
03635 
03636 #define WRITE_BYTE(SRC, DST, FAULT)                                     \
03637   __WRITE_SPECMEM((SRC), (DST), temp_byte, (FAULT))
03638 #define WRITE_HALF(SRC, DST, FAULT)                                     \
03639   __WRITE_SPECMEM(MD_SWAPH(SRC), (DST), temp_half, (FAULT))
03640 #define WRITE_WORD(SRC, DST, FAULT)                                     \
03641   __WRITE_SPECMEM(MD_SWAPW(SRC), (DST), temp_word, (FAULT))
03642 #ifdef HOST_HAS_QWORD
03643 #define WRITE_QWORD(SRC, DST, FAULT)                                    \
03644   __WRITE_SPECMEM(MD_SWAPQ(SRC), (DST), temp_qword, (FAULT))
03645 #endif /* HOST_HAS_QWORD */
03646 
03647 /* system call handler macro */
03648 #define SYSCALL(INST)                                                   \
03649   (/* only execute system calls in non-speculative mode */              \
03650    (spec_mode ? panic("speculative syscall") : (void) 0),               \
03651    sys_syscall(&regs, mem_access, mem, INST, TRUE))
03652 
03653 /* default register state accessor, used by DLite */
03654 static char *                                   /* err str, NULL for no err */
03655 simoo_reg_obj(struct regs_t *xregs,             /* registers to access */
03656               int is_write,                     /* access type */
03657               enum md_reg_type rt,              /* reg bank to probe */
03658               int reg,                          /* register number */
03659               struct eval_value_t *val)         /* input, output */
03660 {
03661   switch (rt)
03662     {
03663     case rt_gpr:
03664       if (reg < 0 || reg >= MD_NUM_IREGS)
03665         return "register number out of range";
03666 
03667       if (!is_write)
03668         {
03669           val->type = et_uint;
03670           val->value.as_uint = GPR(reg);
03671         }
03672       else
03673         SET_GPR(reg, eval_as_uint(*val));
03674       break;
03675 
03676     case rt_lpr:
03677       if (reg < 0 || reg >= MD_NUM_FREGS)
03678         return "register number out of range";
03679 
03680       /* FIXME: this is not portable... */
03681       abort();
03682 #if 0
03683       if (!is_write)
03684         {
03685           val->type = et_uint;
03686           val->value.as_uint = FPR_L(reg);
03687         }
03688       else
03689         SET_FPR_L(reg, eval_as_uint(*val));
03690 #endif
03691       break;
03692 
03693     case rt_fpr:
03694       /* FIXME: this is not portable... */
03695       abort();
03696 #if 0
03697       if (!is_write)
03698         val->value.as_float = FPR_F(reg);
03699       else
03700         SET_FPR_F(reg, val->value.as_float);
03701 #endif
03702       break;
03703 
03704     case rt_dpr:
03705       /* FIXME: this is not portable... */
03706       abort();
03707 #if 0
03708       /* 1/2 as many regs in this mode */
03709       if (reg < 0 || reg >= MD_NUM_REGS/2)
03710         return "register number out of range";
03711 
03712       if (at == at_read)
03713         val->as_double = FPR_D(reg * 2);
03714       else
03715         SET_FPR_D(reg * 2, val->as_double);
03716 #endif
03717       break;
03718 
03719       /* FIXME: this is not portable... */
03720 #if 0
03721       abort();
03722     case rt_hi:
03723       if (at == at_read)
03724         val->as_word = HI;
03725       else
03726         SET_HI(val->as_word);
03727       break;
03728     case rt_lo:
03729       if (at == at_read)
03730         val->as_word = LO;
03731       else
03732         SET_LO(val->as_word);
03733       break;
03734     case rt_FCC:
03735       if (at == at_read)
03736         val->as_condition = FCC;
03737       else
03738         SET_FCC(val->as_condition);
03739       break;
03740 #endif
03741 
03742     case rt_PC:
03743       if (!is_write)
03744         {
03745           val->type = et_addr;
03746           val->value.as_addr = regs.regs_PC;
03747         }
03748       else
03749         regs.regs_PC = eval_as_addr(*val);
03750       break;
03751 
03752     case rt_NPC:
03753       if (!is_write)
03754         {
03755           val->type = et_addr;
03756           val->value.as_addr = regs.regs_NPC;
03757         }
03758       else
03759         regs.regs_NPC = eval_as_addr(*val);
03760       break;
03761 
03762     default:
03763       panic("bogus register bank");
03764     }
03765 
03766   /* no error */
03767   return NULL;
03768 }
03769 
03770 /* the last operation that ruu_dispatch() attempted to dispatch, for
03771    implementing in-order issue */
03772 static struct RS_link last_op = RSLINK_NULL_DATA;
03773 
03774 /* dispatch instructions from the IFETCH -> DISPATCH queue: instructions are
03775    first decoded, then they allocated RUU (and LSQ for load/stores) resources
03776    and input and output dependence chains are updated accordingly */
03777 static void
03778 ruu_dispatch(void)
03779 {
03780   int i;
03781   int n_dispatched;                     /* total insts dispatched */
03782   md_inst_t inst;                       /* actual instruction bits */
03783   enum md_opcode op;                    /* decoded opcode enum */
03784   int out1, out2, in1, in2, in3;        /* output/input register names */
03785   md_addr_t target_PC;                  /* actual next/target PC address */
03786   md_addr_t addr;                       /* effective address, if load/store */
03787   struct RUU_station *rs;               /* RUU station being allocated */
03788   struct RUU_station *lsq;              /* LSQ station for ld/st's */
03789   struct bpred_update_t *dir_update_ptr;/* branch predictor dir update ptr */
03790   int stack_recover_idx;                /* bpred retstack recovery index */
03791   unsigned int pseq;                    /* pipetrace sequence number */
03792   int is_write;                         /* store? */
03793   int made_check;                       /* used to ensure DLite entry */
03794   int br_taken, br_pred_taken;          /* if br, taken?  predicted taken? */
03795   int fetch_redirected = FALSE;
03796   byte_t temp_byte = 0;                 /* temp variable for spec mem access */
03797   half_t temp_half = 0;                 /* " ditto " */
03798   word_t temp_word = 0;                 /* " ditto " */
03799 #ifdef HOST_HAS_QWORD
03800   qword_t temp_qword = 0;               /* " ditto " */
03801 #endif /* HOST_HAS_QWORD */
03802   enum md_fault_type fault;
03803 
03804   made_check = FALSE;
03805   n_dispatched = 0;
03806   while (/* instruction decode B/W left? */
03807          n_dispatched < (ruu_decode_width * fetch_speed)
03808          /* RUU and LSQ not full? */
03809          && RUU_num < RUU_size && LSQ_num < LSQ_size
03810          /* insts still available from fetch unit? */
03811          && fetch_num != 0
03812          /* on an acceptable trace path */
03813          && (ruu_include_spec || !spec_mode))
03814     {
03815       /* if issuing in-order, block until last op issues if inorder issue */
03816       if (ruu_inorder_issue
03817           && (last_op.rs && RSLINK_VALID(&last_op)
03818               && !OPERANDS_READY(last_op.rs)))
03819         {
03820           /* stall until last operation is ready to issue */
03821           break;
03822         }
03823 
03824       /* get the next instruction from the IFETCH -> DISPATCH queue */
03825       inst = fetch_data[fetch_head].IR;
03826       regs.regs_PC = fetch_data[fetch_head].regs_PC;
03827       pred_PC = fetch_data[fetch_head].pred_PC;
03828       dir_update_ptr = &(fetch_data[fetch_head].dir_update);
03829       stack_recover_idx = fetch_data[fetch_head].stack_recover_idx;
03830       pseq = fetch_data[fetch_head].ptrace_seq;
03831 
03832       /* decode the inst */
03833       MD_SET_OPCODE(op, inst);
03834 
03835       /* compute default next PC */
03836       regs.regs_NPC = regs.regs_PC + sizeof(md_inst_t);
03837 
03838       /* drain RUU for TRAPs and system calls */
03839       if (MD_OP_FLAGS(op) & F_TRAP)
03840         {
03841           if (RUU_num != 0)
03842             break;
03843 
03844           /* else, syscall is only instruction in the machine, at this
03845              point we should not be in (mis-)speculative mode */
03846           if (spec_mode)
03847             panic("drained and speculative");
03848         }
03849 
03850       /* maintain $r0 semantics (in spec and non-spec space) */
03851       regs.regs_R[MD_REG_ZERO] = 0; spec_regs_R[MD_REG_ZERO] = 0;
03852 #ifdef TARGET_ALPHA
03853       regs.regs_F.d[MD_REG_ZERO] = 0.0; spec_regs_F.d[MD_REG_ZERO] = 0.0;
03854 #endif /* TARGET_ALPHA */
03855 
03856       if (!spec_mode)
03857         {
03858           /* one more non-speculative instruction executed */
03859           sim_num_insn++;
03860         }
03861 
03862       /* default effective address (none) and access */
03863       addr = 0; is_write = FALSE;
03864 
03865       /* set default fault - none */
03866       fault = md_fault_none;
03867 
03868       /* more decoding and execution */
03869       switch (op)
03870         {
03871 #define DEFINST(OP,MSK,NAME,OPFORM,RES,CLASS,O1,O2,I1,I2,I3)            \
03872         case OP:                                                        \
03873           /* compute output/input dependencies to out1-2 and in1-3 */   \
03874           out1 = O1; out2 = O2;                                         \
03875           in1 = I1; in2 = I2; in3 = I3;                                 \
03876           /* execute the instruction */                                 \
03877           SYMCAT(OP,_IMPL);                                             \
03878           break;
03879 #define DEFLINK(OP,MSK,NAME,MASK,SHIFT)                                 \
03880         case OP:                                                        \
03881           /* could speculatively decode a bogus inst, convert to NOP */ \
03882           op = MD_NOP_OP;                                               \
03883           /* compute output/input dependencies to out1-2 and in1-3 */   \
03884           out1 = NA; out2 = NA;                                         \
03885           in1 = NA; in2 = NA; in3 = NA;                                 \
03886           /* no EXPR */                                                 \
03887           break;
03888 #define CONNECT(OP)     /* nada... */
03889           /* the following macro wraps the instruction fault declaration macro
03890              with a test to see if the trace generator is in non-speculative
03891              mode, if so the instruction fault is declared, otherwise, the
03892              error is shunted because instruction faults need to be masked on
03893              the mis-speculated instruction paths */
03894 #define DECLARE_FAULT(FAULT)                                            \
03895           {                                                             \
03896             if (!spec_mode)                                             \
03897               fault = (FAULT);                                          \
03898             /* else, spec fault, ignore it, always terminate exec... */ \
03899             break;                                                      \
03900           }
03901 #include "machine.def"
03902         default:
03903           /* can speculatively decode a bogus inst, convert to a NOP */
03904           op = MD_NOP_OP;
03905           /* compute output/input dependencies to out1-2 and in1-3 */   \
03906           out1 = NA; out2 = NA;
03907           in1 = NA; in2 = NA; in3 = NA;
03908           /* no EXPR */
03909         }
03910       /* operation sets next PC */
03911 
03912       /* print retirement trace if in verbose mode */
03913       if (!spec_mode && verbose)
03914         {
03915           myfprintf(stderr, "++ %10n [xor: 0x%08x] {%d} @ 0x%08p: ",
03916                     sim_num_insn, md_xor_regs(&regs),
03917                     inst_seq+1, regs.regs_PC);
03918           md_print_insn(inst, regs.regs_PC, stderr);
03919           fprintf(stderr, "\n");
03920           /* fflush(stderr); */
03921         }
03922 
03923       if (fault != md_fault_none)
03924         fatal("non-speculative fault (%d) detected @ 0x%08p",
03925               fault, regs.regs_PC);
03926 
03927       /* update memory access stats */
03928       if (MD_OP_FLAGS(op) & F_MEM)
03929         {
03930           sim_total_refs++;
03931           if (!spec_mode)
03932             sim_num_refs++;
03933 
03934           if (MD_OP_FLAGS(op) & F_STORE)
03935             is_write = TRUE;
03936           else
03937             {
03938               sim_total_loads++;
03939               if (!spec_mode)
03940                 sim_num_loads++;
03941             }
03942         }
03943 
03944       br_taken = (regs.regs_NPC != (regs.regs_PC + sizeof(md_inst_t)));
03945       br_pred_taken = (pred_PC != (regs.regs_PC + sizeof(md_inst_t)));
03946 
03947       if ((pred_PC != regs.regs_NPC && pred_perfect)
03948           || ((MD_OP_FLAGS(op) & (F_CTRL|F_DIRJMP)) == (F_CTRL|F_DIRJMP)
03949               && target_PC != pred_PC && br_pred_taken))
03950         {
03951           /* Either 1) we're simulating perfect prediction and are in a
03952              mis-predict state and need to patch up, or 2) We're not simulating
03953              perfect prediction, we've predicted the branch taken, but our
03954              predicted target doesn't match the computed target (i.e.,
03955              mis-fetch).  Just update the PC values and do a fetch squash.
03956              This is just like calling fetch_squash() except we pre-anticipate
03957              the updates to the fetch values at the end of this function.  If
03958              case #2, also charge a mispredict penalty for redirecting fetch */
03959           fetch_pred_PC = fetch_regs_PC = regs.regs_NPC;
03960           /* was: if (pred_perfect) */
03961           if (pred_perfect)
03962             pred_PC = regs.regs_NPC;
03963 
03964           fetch_head = (ruu_ifq_size-1);
03965           fetch_num = 1;
03966           fetch_tail = 0;
03967 
03968           if (!pred_perfect)
03969             ruu_fetch_issue_delay = ruu_branch_penalty;
03970 
03971           fetch_redirected = TRUE;
03972         }
03973 
03974       /* is this a NOP */
03975       if (op != MD_NOP_OP)
03976         {
03977           /* for load/stores:
03978                idep #0     - store operand (value that is store'ed)
03979                idep #1, #2 - eff addr computation inputs (addr of access)
03980 
03981              resulting RUU/LSQ operation pair:
03982                RUU (effective address computation operation):
03983                  idep #0, #1 - eff addr computation inputs (addr of access)
03984                LSQ (memory access operation):
03985                  idep #0     - operand input (value that is store'd)
03986                  idep #1     - eff addr computation result (from RUU op)
03987 
03988              effective address computation is transfered via the reserved
03989              name DTMP
03990            */
03991 
03992           /* fill in RUU reservation station */
03993           rs = &RUU[RUU_tail];
03994           rs->slip = sim_cycle - 1;
03995           rs->IR = inst;
03996           rs->op = op;
03997           rs->PC = regs.regs_PC;
03998           rs->next_PC = regs.regs_NPC; rs->pred_PC = pred_PC;
03999           rs->in_LSQ = FALSE;
04000           rs->ea_comp = FALSE;
04001           rs->recover_inst = FALSE;
04002           rs->dir_update = *dir_update_ptr;
04003           rs->stack_recover_idx = stack_recover_idx;
04004           rs->spec_mode = spec_mode;
04005           rs->addr = 0;
04006           /* rs->tag is already set */
04007           rs->seq = ++inst_seq;
04008           rs->queued = rs->issued = rs->completed = FALSE;
04009           rs->ptrace_seq = pseq;
04010 
04011           /* split ld/st's into two operations: eff addr comp + mem access */
04012           if (MD_OP_FLAGS(op) & F_MEM)
04013             {
04014               /* convert RUU operation from ld/st to an add (eff addr comp) */
04015               rs->op = MD_AGEN_OP;
04016               rs->ea_comp = TRUE;
04017 
04018               /* fill in LSQ reservation station */
04019               lsq = &LSQ[LSQ_tail];
04020               lsq->slip = sim_cycle - 1;
04021               lsq->IR = inst;
04022               lsq->op = op;
04023               lsq->PC = regs.regs_PC;
04024               lsq->next_PC = regs.regs_NPC; lsq->pred_PC = pred_PC;
04025               lsq->in_LSQ = TRUE;
04026               lsq->ea_comp = FALSE;
04027               lsq->recover_inst = FALSE;
04028               lsq->dir_update.pdir1 = lsq->dir_update.pdir2 = NULL;
04029               lsq->dir_update.pmeta = NULL;
04030               lsq->stack_recover_idx = 0;
04031               lsq->spec_mode = spec_mode;
04032               lsq->addr = addr;
04033               /* lsq->tag is already set */
04034               lsq->seq = ++inst_seq;
04035               lsq->queued = lsq->issued = lsq->completed = FALSE;
04036               lsq->ptrace_seq = ptrace_seq++;
04037 
04038               /* pipetrace this uop */
04039               ptrace_newuop(lsq->ptrace_seq, "internal ld/st", lsq->PC, 0);
04040               ptrace_newstage(lsq->ptrace_seq, PST_DISPATCH, 0);
04041 
04042               /* link eff addr computation onto operand's output chains */
04043               ruu_link_idep(rs, /* idep_ready[] index */0, NA);
04044               ruu_link_idep(rs, /* idep_ready[] index */1, in2);
04045               ruu_link_idep(rs, /* idep_ready[] index */2, in3);
04046 
04047               /* install output after inputs to prevent self reference */
04048               ruu_install_odep(rs, /* odep_list[] index */0, DTMP);
04049               ruu_install_odep(rs, /* odep_list[] index */1, NA);
04050 
04051               /* link memory access onto output chain of eff addr operation */
04052               ruu_link_idep(lsq,
04053                             /* idep_ready[] index */STORE_OP_INDEX/* 0 */,
04054                             in1);
04055               ruu_link_idep(lsq,
04056                             /* idep_ready[] index */STORE_ADDR_INDEX/* 1 */,
04057                             DTMP);
04058               ruu_link_idep(lsq, /* idep_ready[] index */2, NA);
04059 
04060               /* install output after inputs to prevent self reference */
04061               ruu_install_odep(lsq, /* odep_list[] index */0, out1);
04062               ruu_install_odep(lsq, /* odep_list[] index */1, out2);
04063 
04064               /* install operation in the RUU and LSQ */
04065               n_dispatched++;
04066               RUU_tail = (RUU_tail + 1) % RUU_size;
04067               RUU_num++;
04068               LSQ_tail = (LSQ_tail + 1) % LSQ_size;
04069               LSQ_num++;
04070 
04071               if (OPERANDS_READY(rs))
04072                 {
04073                   /* eff addr computation ready, queue it on ready list */
04074                   readyq_enqueue(rs);
04075                 }
04076               /* issue may continue when the load/store is issued */
04077               RSLINK_INIT(last_op, lsq);
04078 
04079               /* issue stores only, loads are issued by lsq_refresh() */
04080               if (((MD_OP_FLAGS(op) & (F_MEM|F_STORE)) == (F_MEM|F_STORE))
04081                   && OPERANDS_READY(lsq))
04082                 {
04083                   /* panic("store immediately ready"); */
04084                   /* put operation on ready list, ruu_issue() issue it later */
04085                   readyq_enqueue(lsq);
04086                 }
04087             }
04088           else /* !(MD_OP_FLAGS(op) & F_MEM) */
04089             {
04090               /* link onto producing operation */
04091               ruu_link_idep(rs, /* idep_ready[] index */0, in1);
04092               ruu_link_idep(rs, /* idep_ready[] index */1, in2);
04093               ruu_link_idep(rs, /* idep_ready[] index */2, in3);
04094 
04095               /* install output after inputs to prevent self reference */
04096               ruu_install_odep(rs, /* odep_list[] index */0, out1);
04097               ruu_install_odep(rs, /* odep_list[] index */1, out2);
04098 
04099               /* install operation in the RUU */
04100               n_dispatched++;
04101               RUU_tail = (RUU_tail + 1) % RUU_size;
04102               RUU_num++;
04103 
04104               /* issue op if all its reg operands are ready (no mem input) */
04105               if (OPERANDS_READY(rs))
04106                 {
04107                   /* put operation on ready list, ruu_issue() issue it later */
04108                   readyq_enqueue(rs);
04109                   /* issue may continue */
04110                   last_op = RSLINK_NULL;
04111                 }
04112               else
04113                 {
04114                   /* could not issue this inst, stall issue until we can */
04115                   RSLINK_INIT(last_op, rs);
04116                 }
04117             }
04118         }
04119       else
04120         {
04121           /* this is a NOP, no need to update RUU/LSQ state */
04122           rs = NULL;
04123         }
04124 
04125       /* one more instruction executed, speculative or otherwise */
04126       sim_total_insn++;
04127       if (MD_OP_FLAGS(op) & F_CTRL)
04128         sim_total_branches++;
04129 
04130       if (!spec_mode)
04131         {
04132 #if 0 /* moved above for EIO trace file support */
04133           /* one more non-speculative instruction executed */
04134           sim_num_insn++;
04135 #endif
04136 
04137           /* if this is a branching instruction update BTB, i.e., only
04138              non-speculative state is committed into the BTB */
04139           if (MD_OP_FLAGS(op) & F_CTRL)
04140             {
04141               sim_num_branches++;
04142               if (pred && bpred_spec_update == spec_ID)
04143                 {
04144                   bpred_update(pred,
04145                                /* branch address */regs.regs_PC,
04146                                /* actual target address */regs.regs_NPC,
04147                                /* taken? */regs.regs_NPC != (regs.regs_PC +
04148                                                        sizeof(md_inst_t)),
04149                                /* pred taken? */pred_PC != (regs.regs_PC +
04150                                                         sizeof(md_inst_t)),
04151                                /* correct pred? */pred_PC == regs.regs_NPC,
04152                                /* opcode */op,
04153                                /* predictor update ptr */&rs->dir_update);
04154                 }
04155             }
04156 
04157           /* is the trace generator trasitioning into mis-speculation mode? */
04158           if (pred_PC != regs.regs_NPC && !fetch_redirected)
04159             {
04160               /* entering mis-speculation mode, indicate this and save PC */
04161               spec_mode = TRUE;
04162               rs->recover_inst = TRUE;
04163               recover_PC = regs.regs_NPC;
04164             }
04165         }
04166 
04167       /* entered decode/allocate stage, indicate in pipe trace */
04168       ptrace_newstage(pseq, PST_DISPATCH,
04169                       (pred_PC != regs.regs_NPC) ? PEV_MPOCCURED : 0);
04170       if (op == MD_NOP_OP)
04171         {
04172           /* end of the line */
04173           ptrace_endinst(pseq);
04174         }
04175 
04176       /* update any stats tracked by PC */
04177       for (i=0; i<pcstat_nelt; i++)
04178         {
04179           counter_t newval;
04180           int delta;
04181 
04182           /* check if any tracked stats changed */
04183           newval = STATVAL(pcstat_stats[i]);
04184           delta = newval - pcstat_lastvals[i];
04185           if (delta != 0)
04186             {
04187               stat_add_samples(pcstat_sdists[i], regs.regs_PC, delta);
04188               pcstat_lastvals[i] = newval;
04189             }
04190         }
04191 
04192       /* consume instruction from IFETCH -> DISPATCH queue */
04193       fetch_head = (fetch_head+1) & (ruu_ifq_size - 1);
04194       fetch_num--;
04195 
04196       /* check for DLite debugger entry condition */
04197       made_check = TRUE;
04198       if (dlite_check_break(pred_PC,
04199                             is_write ? ACCESS_WRITE : ACCESS_READ,
04200                             addr, sim_num_insn, sim_cycle))
04201         dlite_main(regs.regs_PC, pred_PC, sim_cycle, &regs, mem);
04202     }
04203 
04204   /* need to enter DLite at least once per cycle */
04205   if (!made_check)
04206     {
04207       if (dlite_check_break(/* no next PC */0,
04208                             is_write ? ACCESS_WRITE : ACCESS_READ,
04209                             addr, sim_num_insn, sim_cycle))
04210         dlite_main(regs.regs_PC, /* no next PC */0, sim_cycle, &regs, mem);
04211     }
04212 }
04213 
04214 
04215 /*
04216  *  RUU_FETCH() - instruction fetch pipeline stage(s)
04217  */
04218 
04219 /* initialize the instruction fetch pipeline stage */
04220 static void
04221 fetch_init(void)
04222 {
04223   /* allocate the IFETCH -> DISPATCH instruction queue */
04224   fetch_data =
04225     (struct fetch_rec *)calloc(ruu_ifq_size, sizeof(struct fetch_rec));
04226   if (!fetch_data)
04227     fatal("out of virtual memory");
04228 
04229   fetch_num = 0;
04230   fetch_tail = fetch_head = 0;
04231   IFQ_count = 0;
04232   IFQ_fcount = 0;
04233 }
04234 
04235 /* dump contents of fetch stage registers and fetch queue */
04236 void
04237 fetch_dump(FILE *stream)                        /* output stream */
04238 {
04239   int num, head;
04240 
04241   if (!stream)
04242     stream = stderr;
04243 
04244   fprintf(stream, "** fetch stage state **\n");
04245 
04246   fprintf(stream, "spec_mode: %s\n", spec_mode ? "t" : "f");
04247   myfprintf(stream, "pred_PC: 0x%08p, recover_PC: 0x%08p\n",
04248             pred_PC, recover_PC);
04249   myfprintf(stream, "fetch_regs_PC: 0x%08p, fetch_pred_PC: 0x%08p\n",
04250             fetch_regs_PC, fetch_pred_PC);
04251   fprintf(stream, "\n");
04252 
04253   fprintf(stream, "** fetch queue contents **\n");
04254   fprintf(stream, "fetch_num: %d\n", fetch_num);
04255   fprintf(stream, "fetch_head: %d, fetch_tail: %d\n",
04256           fetch_head, fetch_tail);
04257 
04258   num = fetch_num;
04259   head = fetch_head;
04260   while (num)
04261     {
04262       fprintf(stream, "idx: %2d: inst: `", head);
04263       md_print_insn(fetch_data[head].IR, fetch_data[head].regs_PC, stream);
04264       fprintf(stream, "'\n");
04265       myfprintf(stream, "         regs_PC: 0x%08p, pred_PC: 0x%08p\n",
04266                 fetch_data[head].regs_PC, fetch_data[head].pred_PC);
04267       head = (head + 1) & (ruu_ifq_size - 1);
04268       num--;
04269     }
04270 }
04271 
04272 static int last_inst_missed = FALSE;
04273 static int last_inst_tmissed = FALSE;
04274 
04275 /* fetch up as many instruction as one branch prediction and one cache line
04276    acess will support without overflowing the IFETCH -> DISPATCH QUEUE */
04277 static void
04278 ruu_fetch(void)
04279 {
04280   int i, lat, tlb_lat, done = FALSE;
04281   md_inst_t inst;
04282   int stack_recover_idx;
04283   int branch_cnt;
04284 
04285   for (i=0, branch_cnt=0;
04286        /* fetch up to as many instruction as the DISPATCH stage can decode */
04287        i < (ruu_decode_width * fetch_speed)
04288        /* fetch until IFETCH -> DISPATCH queue fills */
04289        && fetch_num < ruu_ifq_size
04290        /* and no IFETCH blocking condition encountered */
04291        && !done;
04292        i++)
04293     {
04294       /* fetch an instruction at the next predicted fetch address */
04295       fetch_regs_PC = fetch_pred_PC;
04296 
04297       /* is this a bogus text address? (can happen on mis-spec path) */
04298       if (ld_text_base <= fetch_regs_PC
04299           && fetch_regs_PC < (ld_text_base+ld_text_size)
04300           && !(fetch_regs_PC & (sizeof(md_inst_t)-1)))
04301         {
04302           /* read instruction from memory */
04303           MD_FETCH_INST(inst, mem, fetch_regs_PC);
04304 
04305           /* address is within program text, read instruction from memory */
04306           lat = cache_il1_lat;
04307           if (cache_il1)
04308             {
04309               /* access the I-cache */
04310               lat =
04311                 cache_access(cache_il1, Read, IACOMPRESS(fetch_regs_PC),
04312                              NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle,
04313                              NULL, NULL);
04314               if (lat > cache_il1_lat)
04315                 last_inst_missed = TRUE;
04316             }
04317 
04318           if (itlb)
04319             {
04320               /* access the I-TLB, NOTE: this code will initiate
04321                  speculative TLB misses */
04322               tlb_lat =
04323                 cache_access(itlb, Read, IACOMPRESS(fetch_regs_PC),
04324                              NULL, ISCOMPRESS(sizeof(md_inst_t)), sim_cycle,
04325                              NULL, NULL);
04326               if (tlb_lat > 1)
04327                 last_inst_tmissed = TRUE;
04328 
04329               /* I-cache/I-TLB accesses occur in parallel */
04330               lat = MAX(tlb_lat, lat);
04331             }
04332 
04333           /* I-cache/I-TLB miss? assumes I-cache hit >= I-TLB hit */
04334           if (lat != cache_il1_lat)
04335             {
04336               /* I-cache miss, block fetch until it is resolved */
04337               ruu_fetch_issue_delay += lat - 1;
04338               break;
04339             }
04340           /* else, I-cache/I-TLB hit */
04341         }
04342       else
04343         {
04344           /* fetch PC is bogus, send a NOP down the pipeline */
04345           inst = MD_NOP_INST;
04346         }
04347 
04348       /* have a valid inst, here */
04349 
04350       /* possibly use the BTB target */
04351       if (pred)
04352         {
04353           enum md_opcode op;
04354 
04355           /* pre-decode instruction, used for bpred stats recording */
04356           MD_SET_OPCODE(op, inst);
04357           
04358           /* get the next predicted fetch address; only use branch predictor
04359              result for branches (assumes pre-decode bits); NOTE: returned
04360              value may be 1 if bpred can only predict a direction */
04361           if (MD_OP_FLAGS(op) & F_CTRL)
04362             fetch_pred_PC =
04363               bpred_lookup(pred,
04364                            /* branch address */fetch_regs_PC,
04365                            /* target address *//* FIXME: not computed */0,
04366                            /* opcode */op,
04367                            /* call? */MD_IS_CALL(op),
04368                            /* return? */MD_IS_RETURN(op),
04369                            /* updt */&(fetch_data[fetch_tail].dir_update),
04370                            /* RSB index */&stack_recover_idx);
04371           else
04372             fetch_pred_PC = 0;
04373 
04374           /* valid address returned from branch predictor? */
04375           if (!fetch_pred_PC)
04376             {
04377               /* no predicted taken target, attempt not taken target */
04378               fetch_pred_PC = fetch_regs_PC + sizeof(md_inst_t);
04379             }
04380           else
04381             {
04382               /* go with target, NOTE: discontinuous fetch, so terminate */
04383               branch_cnt++;
04384               if (branch_cnt >= fetch_speed)
04385                 done = TRUE;
04386             }
04387         }
04388       else
04389         {
04390           /* no predictor, just default to predict not taken, and
04391              continue fetching instructions linearly */
04392           fetch_pred_PC = fetch_regs_PC + sizeof(md_inst_t);
04393         }
04394 
04395       /* commit this instruction to the IFETCH -> DISPATCH queue */
04396       fetch_data[fetch_tail].IR = inst;
04397       fetch_data[fetch_tail].regs_PC = fetch_regs_PC;
04398       fetch_data[fetch_tail].pred_PC = fetch_pred_PC;
04399       fetch_data[fetch_tail].stack_recover_idx = stack_recover_idx;
04400       fetch_data[fetch_tail].ptrace_seq = ptrace_seq++;
04401 
04402       /* for pipe trace */
04403       ptrace_newinst(fetch_data[fetch_tail].ptrace_seq,
04404                      inst, fetch_data[fetch_tail].regs_PC,
04405                      0);
04406       ptrace_newstage(fetch_data[fetch_tail].ptrace_seq,
04407                       PST_IFETCH,
04408                       ((last_inst_missed ? PEV_CACHEMISS : 0)
04409                        | (last_inst_tmissed ? PEV_TLBMISS : 0)));
04410       last_inst_missed = FALSE;
04411       last_inst_tmissed = FALSE;
04412 
04413       /* adjust instruction fetch queue */
04414       fetch_tail = (fetch_tail + 1) & (ruu_ifq_size - 1);
04415       fetch_num++;
04416     }
04417 }
04418 
04419 /* default machine state accessor, used by DLite */
04420 static char *                                   /* err str, NULL for no err */
04421 simoo_mstate_obj(FILE *stream,                  /* output stream */
04422                  char *cmd,                     /* optional command string */
04423                  struct regs_t *regs,           /* registers to access */
04424                  struct mem_t *mem)             /* memory space to access */
04425 {
04426   if (!cmd || !strcmp(cmd, "help"))
04427     fprintf(stream,
04428 "mstate commands:\n"
04429 "\n"
04430 "    mstate help   - show all machine-specific commands (this list)\n"
04431 "    mstate stats  - dump all statistical variables\n"
04432 "    mstate res    - dump current functional unit resource states\n"
04433 "    mstate ruu    - dump contents of the register update unit\n"
04434 "    mstate lsq    - dump contents of the load/store queue\n"
04435 "    mstate eventq - dump contents of event queue\n"
04436 "    mstate readyq - dump contents of ready instruction queue\n"
04437 "    mstate cv     - dump contents of the register create vector\n"
04438 "    mstate rspec  - dump contents of speculative regs\n"
04439 "    mstate mspec  - dump contents of speculative memory\n"
04440 "    mstate fetch  - dump contents of fetch stage registers and fetch queue\n"
04441 "\n"
04442             );
04443   else if (!strcmp(cmd, "stats"))
04444     {
04445       /* just dump intermediate stats */
04446       sim_print_stats(stream);
04447     }
04448   else if (!strcmp(cmd, "res"))
04449     {
04450       /* dump resource state */
04451       res_dump(fu_pool, stream);
04452     }
04453   else if (!strcmp(cmd, "ruu"))
04454     {
04455       /* dump RUU contents */
04456       ruu_dump(stream);
04457     }
04458   else if (!strcmp(cmd, "lsq"))
04459     {
04460       /* dump LSQ contents */
04461       lsq_dump(stream);
04462     }
04463   else if (!strcmp(cmd, "eventq"))
04464     {
04465       /* dump event queue contents */
04466       eventq_dump(stream);
04467     }
04468   else if (!strcmp(cmd, "readyq"))
04469     {
04470       /* dump event queue contents */
04471       readyq_dump(stream);
04472     }
04473   else if (!strcmp(cmd, "cv"))
04474     {
04475       /* dump event queue contents */
04476       cv_dump(stream);
04477     }
04478   else if (!strcmp(cmd, "rspec"))
04479     {
04480       /* dump event queue contents */
04481       rspec_dump(stream);
04482     }
04483   else if (!strcmp(cmd, "mspec"))
04484     {
04485       /* dump event queue contents */
04486       mspec_dump(stream);
04487     }
04488   else if (!strcmp(cmd, "fetch"))
04489     {
04490       /* dump event queue contents */
04491       fetch_dump(stream);
04492     }
04493   else
04494     return "unknown mstate command";
04495 
04496   /* no error */
04497   return NULL;
04498 }
04499 
04500 
04501 /* start simulation, program loaded, processor precise state initialized */
04502 void
04503 sim_main(void)
04504 {
04505   /* ignore any floating point exceptions, they may occur on mis-speculated
04506      execution paths */
04507   signal(SIGFPE, SIG_IGN);
04508 
04509   /* set up program entry state */
04510   regs.regs_PC = ld_prog_entry;
04511   regs.regs_NPC = regs.regs_PC + sizeof(md_inst_t);
04512 
04513   /* check for DLite debugger entry condition */
04514   if (dlite_check_break(regs.regs_PC, /* no access */0, /* addr */0, 0, 0))
04515     dlite_main(regs.regs_PC, regs.regs_PC + sizeof(md_inst_t),
04516                sim_cycle, &regs, mem);
04517 
04518   /* fast forward simulator loop, performs functional simulation for
04519      FASTFWD_COUNT insts, then turns on performance (timing) simulation */
04520   if (fastfwd_count > 0)
04521     {
04522       int icount;
04523       md_inst_t inst;                   /* actual instruction bits */
04524       enum md_opcode op;                /* decoded opcode enum */
04525       md_addr_t target_PC;              /* actual next/target PC address */
04526       md_addr_t addr;                   /* effective address, if load/store */
04527       int is_write;                     /* store? */
04528       byte_t temp_byte = 0;             /* temp variable for spec mem access */
04529       half_t temp_half = 0;             /* " ditto " */
04530       word_t temp_word = 0;             /* " ditto " */
04531 #ifdef HOST_HAS_QWORD
04532       qword_t temp_qword = 0;           /* " ditto " */
04533 #endif /* HOST_HAS_QWORD */
04534       enum md_fault_type fault;
04535 
04536       fprintf(stderr, "sim: ** fast forwarding %d insts **\n", fastfwd_count);
04537 
04538       for (icount=0; icount < fastfwd_count; icount++)
04539         {
04540           /* maintain $r0 semantics */
04541           regs.regs_R[MD_REG_ZERO] = 0;
04542 #ifdef TARGET_ALPHA
04543           regs.regs_F.d[MD_REG_ZERO] = 0.0;
04544 #endif /* TARGET_ALPHA */
04545 
04546           /* get the next instruction to execute */
04547           MD_FETCH_INST(inst, mem, regs.regs_PC);
04548 
04549           /* set default reference address */
04550           addr = 0; is_write = FALSE;
04551 
04552           /* set default fault - none */
04553           fault = md_fault_none;
04554 
04555           /* decode the instruction */
04556           MD_SET_OPCODE(op, inst);
04557 
04558           /* execute the instruction */
04559           switch (op)
04560             {
04561 #define DEFINST(OP,MSK,NAME,OPFORM,RES,FLAGS,O1,O2,I1,I2,I3)            \
04562             case OP:                                                    \
04563               SYMCAT(OP,_IMPL);                                         \
04564               break;
04565 #define DEFLINK(OP,MSK,NAME,MASK,SHIFT)                                 \
04566             case OP:                                                    \
04567               panic("attempted to execute a linking opcode");
04568 #define CONNECT(OP)
04569 #undef DECLARE_FAULT
04570 #define DECLARE_FAULT(FAULT)                                            \
04571               { fault = (FAULT); break; }
04572 #include "machine.def"
04573             default:
04574               panic("attempted to execute a bogus opcode");
04575             }
04576 
04577           if (fault != md_fault_none)
04578             fatal("fault (%d) detected @ 0x%08p", fault, regs.regs_PC);
04579 
04580           /* update memory access stats */
04581           if (MD_OP_FLAGS(op) & F_MEM)
04582             {
04583               if (MD_OP_FLAGS(op) & F_STORE)
04584                 is_write = TRUE;
04585             }
04586 
04587           /* check for DLite debugger entry condition */
04588           if (dlite_check_break(regs.regs_NPC,
04589                                 is_write ? ACCESS_WRITE : ACCESS_READ,
04590                                 addr, sim_num_insn, sim_num_insn))
04591             dlite_main(regs.regs_PC, regs.regs_NPC, sim_num_insn, &regs, mem);
04592 
04593           /* go to the next instruction */
04594           regs.regs_PC = regs.regs_NPC;
04595           regs.regs_NPC += sizeof(md_inst_t);
04596         }
04597     }
04598 
04599   fprintf(stderr, "sim: ** starting performance simulation **\n");
04600 
04601   /* set up timing simulation entry state */
04602   fetch_regs_PC = regs.regs_PC - sizeof(md_inst_t);
04603   fetch_pred_PC = regs.regs_PC;
04604   regs.regs_PC = regs.regs_PC - sizeof(md_inst_t);
04605 
04606   /* main simulator loop, NOTE: the pipe stages are traverse in reverse order
04607      to eliminate this/next state synchronization and relaxation problems */
04608   for (;;)
04609     {
04610       /* RUU/LSQ sanity checks */
04611       if (RUU_num < LSQ_num)
04612         panic("RUU_num < LSQ_num");
04613       if (((RUU_head + RUU_num) % RUU_size) != RUU_tail)
04614         panic("RUU_head/RUU_tail wedged");
04615       if (((LSQ_head + LSQ_num) % LSQ_size) != LSQ_tail)
04616         panic("LSQ_head/LSQ_tail wedged");
04617 
04618       /* check if pipetracing is still active */
04619       ptrace_check_active(regs.regs_PC, sim_num_insn, sim_cycle);
04620 
04621       /* indicate new cycle in pipetrace */
04622       ptrace_newcycle(sim_cycle);
04623 
04624       /* commit entries from RUU/LSQ to architected register file */
04625       ruu_commit();
04626 
04627       /* service function unit release events */
04628       ruu_release_fu();
04629 
04630       /* ==> may have ready queue entries carried over from previous cycles */
04631 
04632       /* service result completions, also readies dependent operations */
04633       /* ==> inserts operations into ready queue --> register deps resolved */
04634       ruu_writeback();
04635 
04636       if (!bugcompat_mode)
04637         {
04638           /* try to locate memory operations that are ready to execute */
04639           /* ==> inserts operations into ready queue --> mem deps resolved */
04640           lsq_refresh();
04641 
04642           /* issue operations ready to execute from a previous cycle */
04643           /* <== drains ready queue <-- ready operations commence execution */
04644           ruu_issue();
04645         }
04646 
04647       /* decode and dispatch new operations */
04648       /* ==> insert ops w/ no deps or all regs ready --> reg deps resolved */
04649       ruu_dispatch();
04650 
04651       if (bugcompat_mode)
04652         {
04653           /* try to locate memory operations that are ready to execute */
04654           /* ==> inserts operations into ready queue --> mem deps resolved */
04655           lsq_refresh();
04656 
04657           /* issue operations ready to execute from a previous cycle */
04658           /* <== drains ready queue <-- ready operations commence execution */
04659           ruu_issue();
04660         }
04661 
04662       /* call instruction fetch unit if it is not blocked */
04663       if (!ruu_fetch_issue_delay)
04664         ruu_fetch();
04665       else
04666         ruu_fetch_issue_delay--;
04667 
04668       /* update buffer occupancy stats */
04669       IFQ_count += fetch_num;
04670       IFQ_fcount += ((fetch_num == ruu_ifq_size) ? 1 : 0);
04671       RUU_count += RUU_num;
04672       RUU_fcount += ((RUU_num == RUU_size) ? 1 : 0);
04673       LSQ_count += LSQ_num;
04674       LSQ_fcount += ((LSQ_num == LSQ_size) ? 1 : 0);
04675 
04676       /* go to next cycle */
04677       sim_cycle++;
04678 
04679       /* finish early? */
04680       if (max_insts && sim_num_insn >= max_insts)
04681         return;
04682     }
04683 }


UVa CS Department of Computer Science
School of Engineering, University of Virginia
151 Engineer's Way, P.O. Box 400740
Charlottesville, Virginia 22904-4740

(434) 982-2200  Fax: (434) 982-2214