Symbol: | A | B | C | D | E | F | G |
Count: | 5 | 3 | 2 | 3 | 6 | 2 | 4 |
How many different optimal prefix encodings are there for the given frequency distribution? Your answer should include a clear explanation of why it is correct.
Answer: There are two full credit answers to this. If you interpret the question to be limited to prefix encodings that could be produced by the Huffman encoding algorithm there are 2^{7}. We can walk through the algorithm, considering the choices at each step. At each iteration, we select the two lowest weight nodes remaning and combine them into a super node. We have choices since there may be multiple nodes with the same weight that could be choosen for one of the lowest weight nodes. We also have two choices how to combine the nodes (the lowest weight node could be the left or right child).Step 1. So, initially, we have to choose C and F as the two lowest weight nodes, and combine them into one super node, either:
Choice A1: Choice A2: o (4) o (4) / \ / \ C F F CStep 2. After this, the two lowest nodes are B and D. Again, we have 2 choices how to combine them:Choice B1: Choice B2: o (6) o (6) / \ / \ B D D BThe choices combine multiplicativity — since they are independent, we can combine choice B1 with either choice A1 or choice A2, and choice B2 with either choice A1 or A2. So, there are 2*2=4 possible choices so far, each leading to a different encoding.Step 3. Now, we have two nodes weight 4 (the supernode created from choice A1 or A2) and the original node G. Combining them gives two more choices:
o (8) o (8) / \ / \ o G G o / \ / \ C F C F or F C or F CWe'll call this node (whichever of the 4 possible nodes used), node GCF (8).Step 4. Now, the lowest weight nodes are node A (5), and two nodes of weight 6 (nodes BD created in step 2, and original node E). So, we have 2 choices for which node to use as the second lowest node, and 2 possible ways of combining it to A. We'll consider the case where we use node BD first. This produces node ABD (11), which could be one of 4 possible nodes:
o o o o / \ / \ / \ / \ A o A o o A o A / \ / \ / \ / \ B D D B B D D BStep 5. At the next step, the lowest weight remaining node is E (6). We have two ways to combine this with the GCF (8) node (of which there are 4 possible versions), for 8 possible EGCF (14) nodes.Step 6. At this stage, the two lowest nodes will be the ABD (11) node and the EGCF (14) node. There are 4 possible ABD nodes * 8 possible EGCF nodes * 2 ways to combine them for 64 total possible encoding trees.
Step 4-6 alternate. In step 4, we only considered one case. The other option was to combine A (5) with node E (6). There are two possible AE nodes. Then, in step 5 we would combine node BD (8) with node GCF (8). There are 4 possible BD nodes * 4 possible GCF nodes * 2 ways to combine them = 32 possible BDGCF (14) nodes. Step 6 combines the two remaining nodes: 32 BDGCF (14) nodes * 2 AE nodes = 64 encoding trees.
Combining. So, combining the two alternatives at step 4, we have 64 + 64 = 128 different encoding trees that could result from Huffman's algorithm.
All Optimal Encodings. In fact, as Trevor Perrier pointed out, this does not include all possible optimal encoding trees. Note that we can always interchange nodes at the same level without changing the optimality of an encoding, since the number of bits for each symbol is the same. This means, there are some additional possibilities.
Consider this Huffman encoding tree:
ooooooooooooooooooooo / / \ o E o / \ / \ A o G o / \ / \ B D C FWith the same structure, we can interchange B, D, C, F (4! = 24 possible orderings), A and G (2) = 48 possible orderings. We can also flip all the left and right children (we shouldn't count this for nodes where both children are leaves, since we already counted those in the reorderings). But, we can flip the top (ABD / EGCF) and second level nodes (A / BD; E / GCF) for another multiple of 8: 48 * 8 = 384.We also need to consider the possibilities corresponding to the alternate choice in step 4:
ooooooooooooooooooooo / / \ o oo o / \ /\ / \ A E B D G o / \ C FHere, we have 2 orderings at the bottom level * 6! = 720 orderings at the next level = 240 orderings * 2 flip at top. (Note that the orderings at level 2 account for any flipping of the GCF node.) This accounts for 2880 possible encodings.So, the total number of optimal prefix encodings is 3264.
We discussed this in Class 25.
# include# include # include char *copyString (char *s) { char *res = (char *) malloc (sizeof (char) * strlen (s)); strcpy (res, s); return res; } int main (int argc, char **argv) { char *a = "alpha"; char *b = "beta"; while (*a != *b) { b = copyString (b + 1); } printf ("The strings are: %s / %s\n", a, b); exit (0); }
a. The copyString function allocates a new string object and returns it. Each time through the loop, however, we lose the reference to the previous object. So, the new object allocated by the malloc in copyString is never dealocated, and we allocate a new object each iteration through the while loop.4. (10) Explain two reasons why it is easier to write a garbage collector for Python than it is to write a garbage collector for C?b. To fix it, we need to free the storage before losing the last reference to it. One fix is to replace the while loop with:
while (*a != *b) { char *ref = b; b = copyString (b + 1); free (ref); }
We discussed this in Class 25.5. (10) Here is the JVML code for a Java method:
Method int func(int, int) 0 iload_0 1 istore_2 2 iload_1 3 istore_3 4 iload_2 5 iload_3 6 iadd 7 istore 4 9 iload_2 10 iload 4 12 if_icmple 18 15 iinc 4 1 18 iload 4 20 ireturnWrite JVML code for a method with exactly the same behavior with as few instructions as possible. Be careful to make sure the result from your new function will always match the result from the original function on all possible inputs.
The JVML code given was the result of compiling this Java program:static public int func (int a, int b) { int c = a; int d = b; int e = c + d; if (c > e) { e++; } return e; }There are several inefficiencies in the generated bytecode. For example, it uses variable 3 and 4, but they are not needed. This can avoid the istore_2 instruction.Here is a shorter sequence (but not the shortest possible) that has the same behavior:
Method int func(int, int) 0 iload_0 # stack: param1 1 iload_1 # stack: param2 param1 2 iadd # stack: (param2+param1) 3 dup # stack: (param2+param1) (param2+param1) 4 iload_0 # stack: param1 (param2+param1) (param2+param1) 5 if_icmple 9 # stack: (param2+param1) 8 iinc_1 # stack: (param2+param1+1) 9 ireturn
6. (5)
mov eax, ebx |
push ecx mov ebx, ecx mov eax, ecx mov cx, bx pop ecx |
Fragment A | Fragment B |
Different. There are many differences here, for example fragment A only modifies the value in eax but fragment B also modifies the value in ebx. So, the behaviors will be obviously different for any initial state where ebx does not initially hold the same value as ecx.7. (10) For this question, assume the called function _func correctly follows the C calling convention.
push 216 push 202 call _func add esp, 8 |
push eax push 216 push 202 call _func add esp, 8 pop eax |
Fragment A | Fragment B |
Different. The C calling convention specifies that the callee place the result in eax. So, after the call _func instruction, eax will hold whatever value _func returns. In fragment A this is the final value of eax. In fragment B, the callee first pushes eax on the stack and then restores it. So the final value of eax will be the same as its initial value. Hence, the behaviors will be noticably different if the initial value in eax is different from the result returned by _func.8. (10) Do the two functions have equivalent behavior? (Assume all callers must correctly follow the C calling convention.)
_myFunc PROC ; Subroutine Prologue push ebp ; Save the old base pointer value. mov ebp, esp ; Set the new base pointer value. sub esp, 4 ; Make room for one 4-byte local variable. push edi ; Save the values of modified registers. push esi ; (no need to save EBX, EBP, or ESP) ; Subroutine Body mov eax, [ebp+8] ; Move parameter 1 into EAX mov esi, [ebp+12] ; Move parameter 2 into ESI mov edi, [ebp+16] ; Move parameter 3 into EDI mov [ebp-4], edi ; Move EDI into local variable add [ebp-4], esi ; Add ESI into local variable add eax, [ebp-4] ; Add local into EAX (result) ; Subroutine Epilogue pop esi ; Recover register values pop edi mov esp, ebp ; Deallocate local variables pop ebp ; Restore the caller's ebp ret _myFunc ENDP END |
_myFunc PROC sub esp, 4 mov eax, [esp+8] mov ecx, [esp+12] mov edx, [esp+16] mov [esp], edx add [esp], ecx add eax, [esp] pop edx ret _myFunc ENDP END |
Fragment A (Note: this is the example from the x86 Guide, with some of the comments shortened to save space) | Fragment B |
Same behavior. Because we are assuming the C calling convention, the caller cannot depend on the final states of the callee-saved registers, so although the final values of ECX and EDX will be different for the two fragments, these differences are not visible, since the caller cannot rely on the values in those registers after a call to _myFunc. Otherwise, the behaviors are the same. The second fragment does not update ebp, but doesn't use it either, instead finding the same values in the appropriate offsets from the esp. The trickiest part is the maintanence of the stack in fragment B. The sup esp, 4 instruction makes room for the local variable. Note that there is no push ebp instruction, so the offsets for the parameters start at esp+8 since the stack has been pushed twice since the final parameter (parameter 1) was pushed (once for the return address, and once to make space for the local variable). Instead of simply using add esp, 4 to reclaim the local variable space at the end of the function, we use pop edx. This has the same effect on the stack pointer, but also copies the value into edx. But, because of the C calling convention, this is not visible, since a caller cannot rely on it.9. (10) Consider modifying the x86 calling convention to put the return value on the top of the stack when a routine returns instead of using EAX to hold the return value. What are the advantages and disadvantages of this change?
We discussed this in Class 25.
CS216: Program and Data Representation University of Virginia |
David Evans evans@cs.virginia.edu Using these Materials |