[an error occurred while processing this directive]

Problem Set 4 Comments

1.
{
   char *s = (char *) malloc (sizeof (*s));
   s[0] = 'a';
   printf ("%c", *(s + 1));
}
Undefined — the value of s is allocated to the size of a single char, so the address s + 1 is outside the s object. Thus, the value of *(s + 1) is undefined.

2.

{
   char *s = (char *) malloc (sizeof (*s) * 6);
   char *t;
   strcpy (s, "cs216");
   t = s;
   free (t);
   printf ("%s", s);
}
Undefined — after the assignment statment, t = s, the local variables t and s are references to the same storage. The call, free (t) deallocates that storage. Using s in the subsequence printf statement has undefined behavior.
3.
char *select (int v, char **s)
{
   if (v) return *s;
   return 1[s];
}

int main (int argc, char **argv) 
{
   char **s = (char **) malloc (sizeof (*s) * 2);
   char *t1 = (char *) malloc (sizeof (*t1) * 6);
   char *t2 = (char *) malloc (sizeof (*t2) * 6);
   char *p;

   s[0] = t1;
   s[1] = t2;
   s[0][0] = 'b';
   p = select (0, s);
   p[0] = 'a';
   
   printf ("%c", **s);
  
}   
Defined, the output is b — We call select with the first parameter equal to 0 so the if is not taken. This means select returns 1[s] which is equivalent to *(s + 1). Since s was passed into select, this means the result is *(s + 1) which is t2. So assigning p[0] = 'a' affects the value of s[1][0], but leaves s[0][0] unchanged (it is still 'b'). The printf prints the value of **s, which is the value of s[0][0].
4. The procedure htree_unparse in htree.c takes as input a htree object and returns a string representation of that encoding tree. The goal here is to produce a machine-readable string, not a human-readable string, so the string is not divided into lines. The htree_unparse defined in htree.c produces the correct string, but it leaks memory. Add the necessary calls to free in htree_unparse to plug the memory leak.
The htree_unparse implementation leaks memory. Each call to htree_unparse returns a new object, but the code does not deallocate the objects returned by the recursive calls. We should add the calls to free shown below:
char *htree_unparse (htree t) {
	...
	} else {
		char *tl = htree_unparse (t->left);
		char *tr = htree_unparse (t->right);
		/* Allocate space for concatenations, [] characters, and null */
		res = (char *) malloc (sizeof (*res) * (strlen(tl) + strlen (tr) + 5));
		if (res == NULL) {
		  fprintf (stderr, "Error: cannot allocate memory in htree_unparse");
		  exit (EXIT_FAILURE);
		}
		strcpy (res, "[");
		strcat (res, tl);
                free (tl);
		strcat (res, "][");
		strcat (res, tr);
                free (tr);
		strcat (res, "]");
		return res;
	}
}
5. Implement the htree_buildTree procedure that takes as input a string and returns the optimal Huffman encoding tree for that string.
htree htree_buildTree (char *s) {
  int frequency [MAX_CHAR];
  char *ts = s;
  int i;
  htree tnodes[MAX_CHAR];
  int numnodes = 0;

  for (i = 0; i < MAX_CHAR; i++) {
    frequency[i] = 0;
  }
  
  while (*ts != '\0') {
    assert ((int) *ts > 0 && (int) *ts < MAX_CHAR);  /* Should always be true */
    frequency[(int) *ts]++;
    ts++;
  }

  {
    for (i = 0; i < MAX_CHAR; i++) {
      if (frequency[i] > 0) {
	htree newnode = (htree) malloc (sizeof (*newnode));
	fflush (stderr);
	if (newnode == NULL) {
	  fprintf (stderr, "Unable to allocate: %d", sizeof (*newnode));
	  exit (EXIT_FAILURE);
	}
	newnode->left = NULL;
	newnode->right = NULL;
	newnode->letter = (char) i;
	newnode->count = frequency[i];
	newnode->parent = NULL;
	tnodes[numnodes++] = newnode;
      }
    }
  }

  /*
  ** Now, we have all the leaves, we need to form the tree by
  ** merging nodes.
  */

  while (numnodes > 1) {
    /* find two nodes to merge with minimum count */
    int best1 = 0;
    int best2 = 1;
    int i;

    for (i = 2; i < numnodes; i++) {
      /*
      ** invariant: tnodes[best1]->count <= tnodes[best2]->count
      */
      
      if (tnodes[best1]->count > tnodes[best2]->count) {
	int tmp = best1;
	best1 = best2;
	best2 = tmp;
      }

      if (tnodes[i]->count < tnodes[best2]->count) {
	best2 = i;
      }
    }

    /* found best 2 nodes - merge them into one node */
    {
      htree newnode = (htree) malloc (sizeof (*newnode));
      if (newnode == NULL) {
	fprintf (stderr, "Unable to allocate: %d", sizeof (*newnode));
	exit (EXIT_FAILURE);
      }
      newnode->left = tnodes[best1];
      newnode->right = tnodes[best2];
      tnodes[best1]->parent = newnode;
      tnodes[best2]->parent = newnode;
      newnode->letter = '\0';
      newnode->count = tnodes[best1]->count + tnodes[best2]->count;
      tnodes[best1] = newnode;
      tnodes[best2] = tnodes[numnodes - 1];
      numnodes--;
    }
  }

  return tnodes[0];
}


6. Construct input files that produce Huffman encoding trees with the properties described in each sub-question. For each part, include both your input file and the output Huffman encoding tree your produced in your answer. (Note, you do not need to use the full alphabet for any of these questions. Your input file should use as few symbols as possible to satisfy the property.)

a. A tree where the letter A is encoded using one bit.

AAAAAAAAAAAAAAAAAABCD

A will be encoded using a single bit. The optimal encoding tree is:

              /\
             o  A
            /\
           B  o
             /\
             D C

b. A tree where each letter is encoded using exactly three bits.

ABCDEFGH
c. A tree where a letter requires more than 6 bits to encode.
This is true for the provided declaration.txt. For example, the encoding for q (frequency 5) is 11010000110.
7. What is the asymptotic running time of our htree_encodeChars procedure? You may assume the input string is long enough that the time taken to produce the Huffman encoding tree does not matter (so you do not have to consider the running time of htree_buildTree and htree_unparse in your answer).
The htree_encodeChars procedure encodes each char in the input string using htree_encodeChar.

htree_encodeChar is quite a complicated procedure. It is traversing the tree to find a leaf matching c. The recursive calls try the left branch first, and then the right branch (so it is doing a depth-first traversal). In the worst case, the matching letter is the rightmost node in the tree. In this cases, we need n recursive calls (where n is the number of symbols in the input text). On average, we would still expect to need Θ (n) calls.

Once the leaf is found, we loop depth times, collecting the code backwards through the tree. Since depth < n, this does not affect the asympototic running time.

htree_encodeChars calls htree_encodeChar once for each input symbol, so the total asymptotic running time is Θ (ls) where l is the length of the input and s is the number of distinct symbols in the input alphabet.

8. The provided htree_encodeChars procedure is very inefficient. Explain how it could be implmented with running time in O(n) where n is the number of characters in the input string s. (You don't need to modify the code, just explain the basic idea.)
We could easily construct a lookup table for the symbols so we can find the code for each symbol in O(1). Since the number of symbols is finite, this could be a simple lookup table. Construction the table could be done with a single traversal of the tree (running time in Θ(s). This would reduce the overall running time to Θ(l + s) = Θ(l) since ls.
9. Complete the implementation of htree_decodeChars. We have provided some code that you may find useful in htree.c, but you can change the implementation however you want. (If you are stuck on this question, you may find it useful to examine the provided htree_decodeBits routine.)
char *htree_decodeChars (FILE *infile) {
  ... /* provided code unchanged */
  
  while ((inchar = fgetc (infile)) != EOF) {
    char c = (char) inchar;
    if (c == '\0') {
      fprintf (stderr, "Error: read file has null character!\n");
      exit (EXIT_FAILURE);
    }

    if (c == '0') { /* go left */
      curnode = curnode->left;
      assert (curnode != NULL);
    } else if (c == '1') { /* go right */
      curnode = curnode->right;
      assert (curnode != NULL);
    } else {
      fprintf (stderr, "Error: bad character: %c [%d]\n%s\n", c, (int) c, res);
      exit (EXIT_FAILURE);
    }
    
    if (htree_isLeaf (curnode)) {
      if (used + 1 > allocsize) {
	allocsize += BLOCK_SIZE;
	res = realloc (res, sizeof(*res) * allocsize);
	if (res == NULL) {
	  fprintf (stderr, "Error: cannot allocate enough memory (%d)\n", allocsize);
	  exit (EXIT_FAILURE);
	}

	tres = res + used;
      }
      
      *tres++ = curnode->letter;
      used++;
      curnode = h;
    }
  }
  
  ...
}
10. Complete the implementation of htree_decodeBits by finishing the assignment to bit (marked with /* Question 10 ...). The value of bit should be zero if the ith bit of c is zero, and non-zero if the ith bit of c is one.
We need to find the value of the ith bit:
      bit = c & power (2, 7 - i);
11. Implement the htree_encodeBits routine. If your implementation is correct, you should be able to decode an encoded file to produce the original result. (Note: it is not necessary to complete question 11 to reach the "green" star level on this assignment, if you answer questions 1-10 well.)
void htree_encodeBits (htree h, char *s, FILE *outfile) {
  /* first, output the htree encoding */
  char *ts = htree_unparse (h);
  unsigned char outbits = 0;
  int curbit = 0;
  int count = 0;

  count = fprintf (outfile, "%s\n", ts);

  while (*s != '\0') {
    char c = *s++;
    char *bits = htree_encodeChar (h, c);
    char *tbits = bits;
    assert (bits != NULL);

    while (*tbits != '\0') {
      if (*tbits == '0') {
	curbit++;
      } else if (*tbits == '1') {
	outbits += power (2, 7 - curbit);
	curbit++;
      }

      if (curbit == 8) {
	fputc (outbits, outfile);
	outbits = 0;
	curbit = 0;
      }
      tbits++;
    }
  }
   
  /* We need to worry about the file not having a byte-divisble number
  ** of bits.  We output the remaining bits (0 if it was byte-divisible)
  ** and a final byte that is the number of padding bits.
  */
 
  if (curbit != 0) {
    /* Output the final byte and a count of the number of good bits */
    fputc (outbits, outfile);
    fputc (curbit, outfile);
  } else {
    fputc (8, outfile); /* last byte was full */
  }
 
  fclose (outfile);
}

CS216: Program and Data Representation
University of Virginia
David Evans
evans@cs.virginia.edu
Using these Materials