<!-- this file was generated automatically by noweave; better not edit it-->
Our compiler will maintain a simple symbol table, adequate for the
purposses of our toy C language. For our VPOi version of the compiler,
some symbol table activities can be handled by the VPOi-ASM symbol
table functionality. However, we will still utilize our symbol tables
in tandem, to minimize the effective changes to our compiler code in porting
our output from assembly language to RTLs.
The identifier (symbol) concept is defined in this module, and so we discuss identifiers as defined by our toy language within this document.
This module contains two basic ADT's. A string table, and an identifier table. Both are hash table implementations. The identifier table (symbol table) relies on the string table ADT. We will discuss each separately.
<defines>= (U->) [D->] /* scopes *** do not rearrange *** */ #define LOCAL 0 #define PARAM 1 #define GLOBAL 2
DefinesGLOBAL,LOCAL,PARAM(links are to index).
Global scope is given to any variable declared outside all function definitions. Local scope is given to a variable declared at the begining of a function body definition (as in C.) Parameter scope is given to formal parameters of a function definition.
As in C, in the case of a local variable sharing a name with a global variable, the local definition overrides the global within the function body. Parameters and locals should not have identical names.
Block level is maintained by a global variable in sym.c.
<blocklevel>= (U->) int level; /* current block level */
Defineslevel(links are to index).
Whenever we enter or leave a block, we call an associated function.
<prototypes>= (U->) [D->] void enterblock(); void leaveblock();
Enterblock simply increments the block level
<functions>= (U->) [D->]
/*
* enterblock - enter a new block
*/
void enterblock()
{
level++;
}
Definesenterblock(links are to index).
while leaveblock is responsible for first clearing the symbol table of any entries associated with the block level we are exiting
<functions>+= (U->) [<-D->]
/*
* leaveblock - exit a block
*/
void leaveblock()
{
IDENT **i, *p, *tmp;
if (level > 0) {
for (i = id_table; i < &id_table[ITABSIZE]; i++) {
for (p = *i; p; p = tmp)
if (p->i_blevel < level)
break;
else {
tmp = p->i_link;
cfree (p);
}
*i = p;
}
level--;
}
}
Definesleaveblock(links are to index).
before decreasing the block level. These functions will be met in our grammar (cgram and cgram-vpoi) as well as our semantic analysis procedures (sem and sem-vpoi).
A discussion of the symbol table data structure appears in a later section of this document.
<defines>+= (U->) [<-D] /* internal types *** do not rearrange *** */ #define T_INT 1 /* integer */ #define T_DOUBLE (1<<1) /* double */ #define T_PROC (1<<2) /* procedure */ #define T_ARRAY (1<<3) /* array */ #define T_PTR (1<<4) /* pointer */
DefinesT_ARRAY,T_DOUBLE,T_INT,T_PROC,T_PTR(links are to index).
A double is a double-word floating point. An integer is a single word
integral value. These are the two allowed standard data types. A
symbol might also represent an array, a procedure name (where the data
type is the return type) and a pointer. A pointer may refer to a
string, or a reference parameter, etc. Our compiler assumes
character data to be of integer type. Literal strings are effectively
handled by the assembly interface (discussed in 'sem' documentation.)
In general we will encounter these defines in a data type known as a TWORD (type word) which is equivalent to an integer type.
<typedefs>= (U->) [D->] typedef unsigned int TWORD;
DefinesTWORD(links are to index).
The String Table ADT
This section describes the string table, which is utilized by the
identifier table ADT. The string table ADT stores unique strings in a
hash table.
<stringtableStructure>= (U->)
struct s_chain {
char *s_ptr;
struct s_chain *s_next;
} *str_table[STABSIZE] = {0}; /* string hash table */
This is a simple array table of linked list. Each member of a linked
list is a string that has collided in the same hash table array entry.
It is declared in sym.c as a global data structure, as can be
seen from the trailing global definition in the structure definition.
The hash table size is set by the following macro definition:
<stabsize>= (U->) # define STABSIZE 119 /* hash table size for strings */
Functions
The symbol table has two functions. The first allows us to lookup a
string in the table.
<prototypes>+= (U->) [<-D->] char *slookup(char str[]);
Definesslookup(links are to index).
If the string is not found, it is installed in the table. In either case, a pointer to the string in the table is returned.
<functions>+= (U->) [<-D->]
/*
* slookup - lookup str in string table, install if necessary, return ptr
*/
char *slookup(char str[])
{
struct s_chain *p;
int i, k;
for (k = i = 0; i < 5; i++) /* simple hash function */
if (str[i])
k += str[i];
else
break;
k %= STABSIZE;
for (p = str_table[k]; p; p = p->s_next)
if (strcmp (str, p->s_ptr) == 0)
return (p->s_ptr);
p = (struct s_chain *) alloc (sizeof(struct s_chain));
p->s_next = str_table[k];
str_table[k] = p;
p->s_ptr = (char *) alloc ((unsigned) strlen (str) + 1);
p->s_ptr = strcpy (p->s_ptr, str);
return (p->s_ptr);
}
Definesslookup(links are to index).
The function begins by using the sum of the ASCII for the string as a hash function. This is then set to an index into the string table size, by modulus, and the string is searched for on the linked list at that hash table entry. If the string is not found, a place for it is allocated at the head of the string node linked list, where it is inserted. The address of the string in the node is returned. If the string was previously in the table, the address of the string in the node in which it was found is returned.
The second function allows us to store the string table to a file.
<prototypes>+= (U->) [<-D->] void sdump(FILE *);
The internals of sdump are trivial.
<functions>+= (U->) [<-D->]
/*
* sdump - dump string table to f
*/
void sdump(FILE *f)
{
struct s_chain **s, *p;
fprintf (f, "Dumping string table\n");
for (s = str_table; s < &str_table[STABSIZE]; s++)
for (p = *s; p; p = p->s_next)
fprintf(f, "%s\n", p->s_ptr);
}
Definessdump(links are to index).
Symbol Table (Identifier Table) ADT
The symbol table maintains type, scope, block level, and width data
for each separate identifier. Notice that string and block level make
a distinct data entry in the symbol table. A string alone may be
used by many identifier entries (signifying variables of differing
block level with the same name.)
<identStructure>= (U->)
/* symbol table entry */
struct id_entry {
IDENT *i_link; /* pointer to next entry on hash chain */
char *i_name; /* pointer to name in string table */
TWORD i_type; /* type code */
int i_blevel; /* block level */
int i_defined; /* non-zero if identifier is defined */
int i_width; /* number of words occupied */
int i_scope; /* scope */
int i_offset; /* offset in activation frame */
/*...added for vpoi...*/
AsmSymbol i_sym;
};
The identifier node, pointed to by i_link, stores a linked list of
identifiers that have collided on the particular identifier array
entry. Then name is a character pointer. It points to the string as
stored in the string hash table. The addition of a block level entry
make the identifier unique. Type and scope information are also stored.
We store the size of the data type in bytes in the i_width
field.
The offset field is used to store the stack offset of a symbol that is of parameter or local scope. Stack offset is given at compile time in the semantic routines. See ``sem'' documentation for details on offset calculation.
VPOi-ASM symbol table node allows
us to utilize our symbol table ADT, but store VPOi-ASM required
symbol structures for the VPOi version of this compiler.
Although we will not ever query VPOi's symbol table, we will need to register each of our symbols with it in order for VPOi to output the correct RTLs. For details of VPOi storage of symbols see ``sem'' documentation. For details of usage of VPOi symbol information, see ``cgen'' documentation.
id_entry data structure is much more commonly referred to as
the IDENT node structure, as the following typedef provides:
<typedefs>+= (U->) [<-D] typedef struct id_entry IDENT;
Functions
Our symbol table is designed to handle languages with local variables
declared within nested blocks recursively, although we will not
encounter such recursiveness in the C-like grammar of our toy
language.
The lookup function
<prototypes>+= (U->) [<-D->] IDENT *lookup(char *, int);
Defineslookup(links are to index).
returns the pointer to the symbol table entry node that identifies a symbol with the string and block level passed as arguments to the function. If no such identifier exists, this function returns a null pointer.
<functions>+= (U->) [<-D->]
/*
* lookup - lookup name, return ptr; use default scope if blev == 0
*/
IDENT *lookup(char *name, int blev)
{
IDENT *p;
for (p = id_table[hash (name) % ITABSIZE]; p; p = p->i_link)
if (name == p->i_name && (blev == 0 || blev == p->i_blevel))
return (p);
return (NULL);
}
Defineslookup(links are to index).
<prototypes>+= (U->) [<-D->] IDENT *install(char *, int);
Definesinstall(links are to index).
places a new identifier on symbol table. We pass the string and block level of the symbol to the symbol table. The function returns the address of the IDENT node for the symbol in the hash table. Passing a negative block level to the function is an indication to assign the global block level to the identifier.
<functions>+= (U->) [<-D->]
/*
* install - install name with block level blev, return ptr
*/
IDENT *install(char *name, int blev)
{
IDENT *ip, **q;
if (blev < 0)
blev = level;
ip = (IDENT *) alloc (sizeof(IDENT));
ip->i_name = name; /* set fields of symbol table */
ip->i_blevel = blev;
for (q = &id_table[hash (name) % ITABSIZE]; *q; q = &((*q)->i_link))
if (blev >= (*q)->i_blevel)
break;
ip->i_link = *q;
*q = ip;
return (ip);
}
Definesinstall(links are to index).
The identifier node will inserted in a collision linked list in order of decreasing block level. Duplicate entries are not checked for, save that they will be stored consecutively.
A function to store a symbol table to a file also exists.
<prototypes>+= (U->) [<-D->] void dump(int, FILE *);
The symbol table will dump all symbols with block level greater than an integer parameter to a File specified by parameter.
<functions>+= (U->) [<-D->]
/*
* dump - dump identifiers with block level >= blev to f
*/
void dump(int blev, FILE *f)
{
IDENT **i, *p;
fprintf (f, "Dumping identifier table\n");
for (i = id_table; i < &id_table[ITABSIZE]; i++)
for (p = *i; p; p = p->i_link)
if (p->i_blevel >= blev)
fprintf (f, "%s\t%d\t%d\t%d\n", p->i_name, p->i_blevel,
p->i_type, p->i_defined);
}
Definesdump(links are to index).
The Global Symbol Table
The size of the hash table is given by a macro definition.
<identtabsize>= (U->) # define ITABSIZE 37 /* hash table size for# identifiers */
We declare a global data structure of identifier node entries to be our identifier hash table
<globals>= (U->)
IDENT *id_table[ITABSIZE] = {0}; /* identifier hash table */
Definesid_table(links are to index).
The hash table function utilized by the identifier table,
<prototypes>+= (U->) [<-D->] int hash(char *);
, uses the address of the string as a hash function.
<functions>+= (U->) [<-D->]
/*
* hash - hash name, turn address into hash number
*/
int hash(char *s)
{
return((int ) s);
}
Defineshash(links are to index).
simply converts the strings pointer to an integer and returns it.
Our code provides an allocation function
<prototypes>+= (U->) [<-D] char *alloc(unsigned);
Definesalloc(links are to index).
which will return a character pointer from a call to the calloc routine.
<functions>+= (U->) [<-D]
/*
* alloc - alloc space
*/
char *alloc(unsigned n)
{
char *p;
if ((p = calloc (1, n)) == NULL) {
yyerror ("csem: out of space");
exit (1);
}
return (p);
}
Definesalloc(links are to index).
Includes
The symbol table Functions will require access to functions and data
structures from the following files:
<includes>= (U->) #include <stdio.h> #include <string.h> #include <stdlib.h> #include "asm.h" #include "rtl.h" #include "sym.h"
Note that among these, sym.h is generated from this documentation file.
sym.c and sym.h.
<sym.c>= /* ========================================================================== = Symbol Table ADT = for storing IDENT nodes. ========================================================================== */ <includes> <stabsize> <identtabsize> <blocklevel> <stringtableStructure> <globals> <functions>
<sym.h>= /*========================================================================= = IDENT info = and Symbol Table ADT = for string IDENT nodes. ========================================================================= */ #ifndef SYMH #define SYMH <defines> <typedefs> <identStructure> <prototypes> #endif
DefinesSYMH(links are to index).