(Spring 1994, Spring 1996, Winter 1998, Winter 1999) Present the Chomsky Hierarchy, and give examples of languages in each class, as well as grammars that produce each and machines that accept them.
| Type | Class of Languages | Example | Machine | Grammar |
|---|---|---|---|---|
| 0 | Unrestricted (Recursively Enumerable) | Any computable function | Turing Machine (that halts on "yes", but doesn't necessarily halt on "no") | Unrestricted grammar. ![]() ![]() where
and are
arbitrary strings of grammar symbols, and is not
empty. |
| 1 | Context Sensitive | anbncn | Linear Bounded Automaton (Turing machine with tape of length linear in the size of the input.) | Context-sensitive grammar. ![]() ![]() where
and are
strings of terminals and nonterminals, and is at
least as long as . Or, a
grammar in which all productions are of the form 1![]() 2![]() 1![]() 2,
where is a
variable, and is a
string of terminals and nonterminals. |
| 2 | Context Free | wwr | Nondeterministic PDA | Context free grammar. ![]() ![]() , where
is a variable and is a
string of terminals and nonterminals. |
| 3 | Regular | 0*1* | Finite Automaton (deterministic or nondeterministic) | Right- or left-linear grammar. ![]() a , where
and are
variables, or the empty string and a is a
constant. |
Are there any other classes of languages besides these?
Yes. Here are a few:
Let's say I buy a new Pentium Pro with 2 Gigabytes of disk space, 64 Megs of RAM, a 512 Kilobyte cache, and an 8X CD-ROM. One address in memory is the starting point of execution for a program, and the computer shuts itself off when it reaches another address. Where does this fit into the Chomsky hierarchy?
Since there are finite number of states that the system could be in, and every transition from one state to the next is deterministic, it is equivalent in power to a DFA.
Is the halting problem decidable for this machine? That is, given the initial state of the machine, can it be determined in finite time whether or not the machine will reach the shut-off address?
Yes. Since the number of possible states is finite, one can run the computer so that it transitions through all n! states. If the computer ever reaches a state that it previously was in, it will never stop. Note that we can not treat the registers as the state of the machine, since it is possible that the computer changes the "input" that is stored in memory.
(Winter 2000) Where in the Chomsky Hierarchy does the language {"the strings that have the same number of 0's and 1's"} fit?
It's a class 2 language, that is, a context-free language.
(Winter 2000) Prove it.
I can prove it by constructing the associated PDA, that will basically push a symbol on the stack each time a zero appears, and pop it each time I see a one.
(Winter 2000) Wait a minute, what if you have a lot of 1's first? The stack will be empty then, how do you handle that?
By reversing the order of pops and pushes when you reach the bottom of the stack (push on a one, pop on a zero), delimited by a special marking symbol.
(Winter 2000) Provide an alternate proof.
I can provide the following context-free grammar for this language: s -> 0s1 | 1s0 | ss | epsilon
Infinite automata are the same as ordinary automata, except that they can have an infinite number of states. Characterize the class of languages accepted by infinite automata.
An infinite automata can accept any language, since a set of transitions from the start symbol to a final state can be made for every string in the language.
What is a language?
A language is a subset of all possible strings formed from a given finite alphabet.
(Spring 1996) What is a regular language? Context free language? Recursively enumerable language? Recursive language?
A regular language is the set of strings accepted by some finite automaton. A context free language is the set of strings generated by some context free grammar. A recursively language is the set of strings accepted by some Turing machine that halts on all input. A recursively enumerable language is the set of strings accepted by some Turing machine that halts on "yes" instances.
What is the definition of a deterministic finite automaton?
A deterministic finite automaton is characterized with a five-tuple, usually
written as (Q,
,
,qo,F),
where Q is a finite set of states,
is a finite
input alphabet,
is a
transition function from Q
to Q,
qo is a start state, and F is a set of final states, F a subset of Q.
What is the definition of a nondeterministic finite automaton?
A nondeterministic finite automaton is the same as a deterministic one,
except the transition function maps Q
to
2Q.
(Winter 1997) Consider an alphabet a1 ... an. Of what class of languages is the language of strings whose symbols appear in increasing order?
Regular, since I can build a DFA that has transitions only for symbols higher than the last one.
(Winter 1997) Let's say that I make the alphabet infinite. Is the resulting language recursively enumerable?
Yes. I can code the symbols in a unary format, and build a Turing machine that only accepts the strings whose symbol-lengths are increasing.
Are the set of languages accepted by DFAs the same as the set of languages accepted by a NFAs? Support your answer.
Yes. Obviously, L(DFAs)
L(NFAs).
One can show that L(NFAs)
L(DFAs) by
converting any NFA into a DFA. The main observation in doing this translation is
that the corresponding states of the DFA represent the state of the entire NFA,
namely all possible states that it could be in. If, for example, there are
several nondeterministic transitions to states q3, q7, and
q8 on the symbol b from states q0 or q1, then
the corresponding state in the DFA would be written [q3,
q7, q8] and the new transition would be
([q0,q1],b)
= [q3, q7, q8].
(Spring 1994) Is the set of languages accepted by a deterministic PDA the same set of languages accepted by a nondeterministic PDA?
No, the language wwr is accepted by a nondeterministic PDA but not by a deterministic one. Basically, you need nondeterminism to know when the first string ends, and the reversed string begins.
For membership, we can build the finite automaton and run it with the given
string. For finiteness, we can build the automaton, and of there is a cycle in
some path from the start state to the final state, then it is infinite. If there
is no cycle in any path from the start state to the final state, it is finite.
To check for equivalence between languages L1 and L2, we
construct the language (L1
¬L2)
(¬L1
L2). If the language is empty, L1 = L2.
What language does the DFA below accept?

L = {w | w has 2m 0's and 2n 1's, m,n
0}
Do DFAs require an exponential number of more states than an equivalent NFA?
At most, yes, but not in general. In the worst case every state in the NFA could have a transition to every other state. (Assuming that one does not add gratuitous useless states to the DFA.)
What is an inherently ambiguous language?
It is a language for which any grammar has at least one string with multiple
derivations. L={anbncmdm | n
1, m
1}
{anbmcmdn | n
1, m
1}. Strings of
the form akbkckdk have two
derivations, but each "sub-language" also contributes other strings, and thus
can not be eliminated.
What are CNF and GNF?
CNF stands for Chomsky Normal Form, and GNF stands for Griebach Normal Form.
They are a way of writing a context free grammar such with restrictions upon the
ordering of nonterminals and terminals. CNF consists of transitions of the form
A
BC or
A
a. GNF
consists of transitions of the form A
a
, where
is a
possibly empty string of nonterminals.
Do they restrict the power of CFLs?
No. Any CFL can be expressed in CNF or GNF.
How does a pushdown automata work?
A PDA consists of a an input tape, a stack initially containing a unique
symbol, and a finite control. The PDA can either consume an input character and
change state, or simply change state (an
move).
Upon changing state, the topmost stack symbol is consumed, and is replaced with
a number (possibly zero) of symbols. The PDA can accept the input string when
the stack is empty, or when it enters an accepting state. (These two accepting
methods are equivalent.)
(Winter 1997) If I provided two stacks, what class of languages does a pushdown automata accept?
It would be equivalent in power to a Turing machine, so it would accept the recursive languages.
(Winter 1997) If the pushdown automata now had an unbounded queue instead of a stack, what class of languages would it accept?
Again, it would be equivalent in power to a Turing machine. In order to get to the symbols in the middle of the queue, I could repeatedly take one element off and put it back on the end.
Are nondeterministic finite automata more powerful than deterministic finite automata?
No. Any NFA can be converted into an equivalent DFA.
Are nondeterministic pushdown automata more powerful than deterministic pushdown automata?
Yes. They NPDAs can recognize wwr because they can nondeterministically guess the middle of the string, while the best that a DPDA could do is w#wr.
Are nondeterministic Turing machines more powerful than deterministic Turing machines?
No.
What properties are regular languages closed under?
Regular Languages are closed under:
-transition
to the start states of M1 and M2. M' will accept if
M1 or M2 accept.
-transition
from all final states of M1 to the start state of M2.
Thus, M' accepts iff M1 accepts part of the string and
M2 accepts the rest.
-move
from the final states of M1 to the start state of M1,
and make M1's start state a final state (since the empty string is
in L1*), we have accepted L1*.
L2
= ¬(¬L1
¬L2).
Since regular languages are closed under union and complement, they are closed
under intersection.
What properties are closed for CFL's?
CFL's are closed under:
S1|S2,
where S1 and S2 are the start symbols of L1
and L2.
S1S2.
S1S|
.
Prove that CFLs are not closed under intersection.
L1 = {aibicj | i,j
1} is context
free, as is L2 = {ajbici | i,j
1}. The
intersection, {aibici | i
1} is not
context free, which we can prove with the pumping lemma for CFLs.
Alternatively, we can do a proof by contradiction. Assume they are closed under complementation. We know they are closed under union, so if A and B are context free languages:
If ¬A
¬B is
context free, then
¬(¬A
¬B) is
context free,
which is the same as A
B.
The last step is a result of DeMorgan's Law: x
y =
¬(¬x
¬y). We know that the context free languages are not closed under intersection,
so our original assumption must have been wrong, and CFLs are not closed under
complementation.
What properties are closed for recursive languages?
Recursive languages are closed under:
What is a useless symbol in a language?
A useless symbol is one that does not have a production that is used in any derivation from S that yields a word composed entirely of terminals.
Explain in lay terms what the pumping lemma is.
The pumping lemma for regular sets is useful for proving that a infinite language is not regular, and the pumping lemma for context free languages can show that a language is not context free. This discussion is based on the one for regular sets, but is analogous to the context free version.
For any regular language L, there is a finite automaton with n states that accepts it. If all the strings in L have length less than or equal to n, then the language is trivially regular. But if there are any strings of length greater than n, then there must be at least one state in the finite automaton that is visited more than once. This means that there is a cycle, and such a cycle can be repeated any number of times. If we happen to find a string created by "pumping" the cycle(s) that is not in the language, then the language is not regular.
Formally, If L is a regular set, then there is a constant n such that if z is
any string in L, and |z|
n, we can write
z as uvw such that |uv|
n, |v|
1, and for all
i
0
uviw is in L.
What does it mean for a problem to be undecidable?
A problem is undecidable if its language is not recursive. That is, A problem is undecidable if there does not exist a TM that takes any instance of this problem and determines "yes" or "no", halting when it is done.
If I have a problem whose language is decidable, do all TM's that accept this language halt?
No, we can construct a machine that outputs "yes" if the language is accepted, but instead of outputting "no" and halting, we will have the machine loop forever. Decidability only means that there exists one machine that will always halt.
Does knowing a problem is decidable give us an algorithm for it?
No. We may not be able to determine the algorithm.
Name an undecidable problem.
Lh (the halting problem) is a known undecidable problem. What language class does its complement fall into? Why?
We know that if a language and its complement are both r.e. then they both are recursive. This is because we can run a machine accepting both the language and its complement in parallel on some instance of this problem. One of the 2 machines are guaranteed to halt and say "yes". If the original machine says "yes", we have a "yes" answer to the problem. If the complement machine says "yes", we have a "yes" answer to the complement, and thus a "no" answer to the problem. Since we can construct a machine that accepts the language and halts on all inputs, the language is recursive.
Since we know the halting problem is r.e. but not recursive, its complement cannot be r.e., and hence is a non-r.e. language.
(Winter 1998) Prove Lh is undecidable.
Here's the standard proof. Another will be given below.
Assume that the halting problem is decidable. Then there exists a Turing machine Mh that will halt and answer "yes" if a given algorithm will halt on string w, and "no" if it will not.
Construct a machine M that takes as input an algorithm and a string, and then executes Mh using the inputs. If the algorithm halts with the given string, then send M into an infinite loop, and exit otherwise.
Finally, if we give M to itself, it will go into an infinite loop if and only if it halts when applied to itself.
Another formulation is the following. If the Turing machine Mh could be built, then we could build a machine M' that takes as input an algorithm and a string. If the algorithm running the string will halt (as determined by Mh), then we run it and return the value, and return false if it will not halt.
Machine M would therefore be able to always halt and give the correct answer, even if the language of the input algorithm is not recursive. Therefore M would make the recursively enumerable languages equal to the recursive languages, but we know this is false (e.g. Lu).
What is Rice's Theorem?
Rice's theorem says that any nontrivial property of the recursively enumerable languages is undecidable. A trivial property is one that all of the languages lack or all have. This means that properties of recursively enumerable languages such as emptiness, finiteness, regularity, and context-freedom can not be determined.
(Spring 1994) Describe the knapsack, traveling salesperson, satisfiability, and halting problems.
When we reduce one problem to another, what are we doing, and what does that mean in terms of the relative difficulty of the problems?
When we reduce A to B, then we construct an algorithm for A using B. If A reduces to B, then B is at least as hard as A.
(Spring 1994) What are the necessary steps for a reduction?
If we are reducing problem A to problem B then what we must do is convert an arbitrary instance of A into an instance of B such that the solution of B implies a solution of A. The following steps are typical:
(Spring 1994) Reduce Hamiltonian Cycle to Traveling Salesperson.
We are given an instance of Hamiltonian Cycle, which is a graph G=(V,E). The Hamiltonian Cycle problem (HC) asks if there is a path on G that reaches every node exactly once and returns to its starting point. The decision version of the Traveling Salesperson Problem (TSP) requires a weighted graph G'=(V',E'), and a cost M. It asks if there is a tour that touches each vertex exactly once and returns to its starting point of cost less than or equal to M. We need to construct G'. We do this by letting G' be the same graph as G, with the weight of every edge equal to 1. We let M=|V|, the number of vertices. Clearly, this construction takes polynomial time.
Now we must show that TSP says "yes" on G' and M iff there is a Hamiltonian cycle on G.
If there is an HC on G, the cycle with also be in G', since the vertices and edges are the same. Since the weight of every edge is 1, the total cost of this cycle is |V|, which is equal to M, and thus TSP will answer "yes".
If there is a TSP on G', we have a tour that touches every edge and returns to its starting point in G. We just use the corresponding edges in G, ignoring the edge weights. The reduction is now complete.
Reduce Partition to Sum of Subsets.
The Partitioning Problem (PART) takes a set S and asks if it can be broken into 2 subsets that have the same sum (clearly, this sum is equal to half the sum of S). For example {1,2,3,5,7} can be broken into {1,3,5} and {2,7}, both of which sum to 9. The sum of Subsets problem (SOS) takes a set S' and a number M and asks if a subset of S exists that sums to M.
We are given a set S, an instance of PART, and need to construct an instance of SOS. We do this by letting S'=S, and letting M equal half the sum of S. Our instance of SOS now asks "is there a subset of S' that sums to 1/2 of its sum?"
Clearly, if SOS says "yes", then there exists a set that sums to half of the sum of S' (and thus S). This set is our partition.
Also, if SOS says "no", then there is no set that sums to half of S', so there cannot be a partition.
Reduce SAT to HALT.
Make an algorithm that does an exhaustive search to see if the input equation for SAT is satisfiable. If it is the algorithm terminates, and if it isn't, then the algorithm goes into an infinite loop.
If we can solve HALT, then we could give it our algorithm. If it answers yes, then the equation must have been satisfiable, and if it answers no, then our equation was not satisfiable.
Let's say you have a problem that you reduce to another that can only be solved in exponential time. Is it possible for the overall computation of the solution to the initial problem (using this reduction) to be polynomial?
Yes. We must realize that when we say "polynomial", we mean "polynomial to the size of the input to A". Thus, if when we convert an instance of A into an instance of B, and the size of the input to B is logarithmic in the size of the input of A, the total time to run B is polynomial in A.
Define the class P.
P is the class of decision problems that are computable with deterministic algorithms in polynomial time.
(Spring 1994, Spring 1995, Spring 1996) What does it mean for a problem to be NP-Complete?
A problem is NP-complete if it is in NP and it is NP-Hard
(Spring 1994, Spring 1995, Spring 1996) What does it mean for a problem to be NP-Hard?
A problem P is NP-Hard if SAT reduces to P in polynomial time. This means that given any instance of SAT we can construct and instance of P in polynomial time such that from the solution of P we can determine (in polynomial time) the solution to SAT.
What does it mean for a problem to be in NP?
A problem is in NP if we can construct a nondeterministic polynomial time algorithm for it. A nondeterministic algorithm is much like a normal algorithm, with the addition of 3 O(1) operations: choice(S), which arbitrarily chooses an element from the set S, success(), which marks successful completion of the algorithm, and failure(), which marks unsuccessful completion. A nondeterministic algorithm fails iff there is no set of choices that leads to a success statement.
Is every NP-Hard problem NP-complete?
No, the halting problem is NP-Hard, but not NP-Complete, since it is not in NP. (The halting is undecidable, and thus has no algorithm)
Show that HALT is NP-Hard.
We show this by showing SAT reduces to HALT: Given an instance of SAT, we run an exhaustive search of every combination of truth values to find out if it is satisfiable. If it is, we halt, otherwise, we run forever. Thus, the algorithm halts iff our input was satisfiable.
But wait, I thought you said that the reduction had to be in polynomial time. Isn't exhaustive search exponential?
We never run the exhaustive search algorithm we have constructed. We just give it as input to our mythical solver for the halting problem. If the halting problem can decide in polynomial time whether the algorithm will halt, we know in polynomial time whether our original input was satisfiable. What needs to be polynomial time is the construction of the algorithm, which is easily done.
(Winter 1998) Is addition NP-complete? What about matrix multiplication?
We don't know. If P=NP, then they are.
(Winter 1998) What is the time complexity of matrix multiplication?
Naive algorithm: O(n3), Strassen's algorithm: O(nlog27)
(Winter 2000) Assuming P=NP, can you provide a polynomial time algorithm for the colorability problem?
Yes. In general, if P=NP, from Cook's theorem, we know that such an algorithm exists for any decidable problem, but we don't know it in advance. However, in this case, such an algorithm can be found.
(Winter 2000) Sketch the algorithm.
Draw Venn diagrams of the possible complexity classes.
|
If P=NP=co-NP, which is unlikely ![]() |
If NP=co-NP ![]() |
|
If P=NP ![]() |
If NP ![]() |
Name five modifications to the standard Turing machine that will not change its power.
(Winter 2000) Assume you have an extended Turing Machine, that has a oracle for the language equivalence problem (which is undecidable) embedded in it. Do there exist some languages that can't be recognized by this machine?
Yes, the Halting Problem for this class of machines (not the classical Halting Problem for Turing Machines) can't be decided by any machine of this class. The proof is the same as the one used for showing that the classical Halting Problem is undecidable.
(Winter 2000) Can you provide an argument based on cardinality?
The total number of languages is uncountable, since it maps to the number of functions that exist. Once can show by considering the subset of this set defined by the functions that map N to (0,1) that this set is uncountable. The total number of languages recognizable by such a machine is countable, since we can provide a unary encoding, for instance, and then order these encodings by lexicographical ordering. Since there are more problems than solutions, some problems are unsolvable.
(Spring 1994) Let's say I have a machine. Does there exist a Turing machine that can tell me if it halts for a given input?
Yes. There exists a machine that, given one machine can produce the correct answer.
Let's say I have 4 machines that I give to it. Can it tell me whether or not they halt?
Yes.
Why isn't this the halting problem?
The machine in question can not accept an arbitrary machine. It is possible, for a finite number of inputs, to output the correct answer. If there are a countably infinite number, there is no machine that can produce the correct answer.
Let's say I have a language that is either recursive, recursively enumerable, or not recursively enumerable. What can it's complement be?
If the language is recursive, then its complement must also be recursive. If the language is r.e., then its complement can not be r.e. since it would then recursive. If the language is not r.e., then its complement may either be r.e. or not r.e., but not recursive.
What are the following languages, and to what class do they belong: Ld, Lu, Lne, Le, Lr, and Lnr?
(Spring 1996) How many non-RE languages are there?
There are an uncountable number of languages. There are a countable number of recursively enumerable languages, since we can encode the Turing machines that accept them using fixed length strings. This means that there are an uncountable number of non-RE languages.
Is the number of strings in a non-recursively enumerable language countably infinite?
A non-r.e. language can not be accepted by a Turing machine. Since we know that Turing machines can enumerate only countably infinite strings of a language, non-recurseively enumerable languages are not countable.
What is Post's Correspondence Problem?
Given two sets of strings w1, w2, ..., wk, and x1, x2, ..., xk, there is a solution if there is some sequence of integers i1, i2, ... im such that
PCP is undecidable.
What is the Church-Turing thesis?
Anything that is computable can be computed by a Turing machine. There does not and can not exist a machine that can compute things that a Turing machine can not. (This can not be proven.)
What is Cook's Theorem?
Cook's theorem says that P=NP if and only if SAT
P. It shows that any nondeterministic program can be converted, in
polynomial time, to a boolean formula that can be solved by SAT.
What's an equivalence relation?
A relation that is reflexive, symmetric, and transitive. This means for a
relation R and a,b in S, aRa for all a in S, aRb
bRa,
and aRb and bRc
aRc.
What is a problem, formally?
It is a binary mapping from a set of problem instances to a set of solutions. The set of solutions is normally defines using some sort of specification.
What does countably infinite or countable mean?
It means that the set in question can be put in a one-to-one correspondence with the integers. That in some sense there are "as many" as the integers.
How many internal nodes does a complete binary tree of depth n have?
2n-1, where a tree consisting of one node has depth 0.
Prove it.
Let's do a proof by induction. Let I(n) be the number of internal nodes at level n.
Basis: 20-1 = 0. A tree of depth 0 has one leaf and no internal nodes.
Assume true for k:
Prove true for k + 1:
The number of internal nodes at level k + 1 is the number of internal nodes at level k plus the number of leaves at level k, which is 2k.
Prove that there are more problems than algorithms.
There are a countable number of algorithms, since we can encode Turing
machines using binary numbers. If we consider the problems that are functions
from the set of naturals (
) to
{0,1}, then we can interpret every element in a subset of the naturals to mean
that value of the function is a 1, and the value of the function for every
element not in the subset is a 0. So each subset of the naturals corresponds to
a function. But there are |2
|
subsets, which is uncountable. Therefore, there are an uncountable number of
functions and a countable number of algorithms, which means there are more
functions than algorithms.
What is diagonalization?
Diagonalization is a method of proof by contradiction in which one tries to show that a one-to-one mapping does not exist between two sets. The method is to first assume that such a mapping exists, then construct an element that should be in the set, but is different from all of the other elements that have already been mapped. Since there exists an unmapped element, there is not a one-to-one mapping.
Prove that the number of real numbers is not countably infinite.
Assume that the reals are countably infinite. Then there exists a one-to-one correspondence between the naturals and the reals. We can construct a real number that does not have a corresponding natural in the following manner: make the nth digit after the decimal equal to one more than the nth digit after the real that corresponds to the nth natural number.
| 1 <=> | .48374658923 ... |
| 2 <=> | .33834759102 ... |
| 3 <=> | .03185746295 ... |
| 4 <=> | .11268376322 ... |
| 5 <=> | .82913847391 ... |
| 6 <=> | .89342283924 ... |
|
| |
| .542743 ... | |
This new real is different from all of the other reals, so we know that it has not been mapped to a natural. Therefore the assumption that the reals are countable was incorrect.
Explain what O (big-O),
(big-Omega),
and
(big-Theta) mean.
We say that f(n) = O(g(n)) if, in the limit, f(n) is less than g(n).
Formally, there exist c and k such that |f(n)|
c×|g(n)| for
all n > k. A function f(n) is
(g(n)) if
f(n) is greater than g(n) in the limit (i.e. g(n) = O(f(n)) ). A function
f(n) is
(g(n)) if it
grows as fast in the limit (i.e. f(n) = O(g(n)) and g(n) = O(f(n)) ).
Prove that there is no sorting algorithm that runs in time better than O(n×log2n).
For a set of elements to be sorted, there are at most n! different orderings. This means that any algorithm that we use must be able to differentiate between these different possibilities. We can draw a decision tree where each node represents a comparison.

In the worst case, the height of the tree is log2(n!) =
(n×log n),
using Stirling's approximation, n!
Sqrt(2pi×n)(n/e)n.
(Winter 1997) Let's say I have a data structure for which you can do insertions, deletions, and finding the minimum in constant time. What would you say?
You would have to be mistaken. If I had such an algorithm, I could sort an arbitrary set of elements in linear time by inserting them all, then removing the minimum repeatedly. Since we know that sorting takes O(n×log n), there can not exist such a data structure.
(Winter 1998) Given the input of the top, bottom, left, and right coordinates of n rectangles, provide an algorithm that will give the coordinates of the intersection of the rectangles, if it exists.
(Winter 1998) Provide a linear time algorithm and prove that it runs in linear time.
(Spring 1994) Describe the algorithms for the following: heap, quick, merge, bucket, and radix sort.
(Spring 1994) What is the order of the algorithms?
(Hash asked Winter 1999) Describe the following Searching Algorithms: Linear, Binary, Hash. How fast is each?
A linear search is a sequential comparison of every element in a set that is
not necessarily ordered, and is
(n). A binary
search is used for sorted lists of elements, by recursively selecting the
midpoint of the remaining list and removing half of the elements from
consideration. Its run time is
(lg n). Using
an open addressed hash table, in which all of the values are stored in the hash
table itself, a search could take O(1) time in the best case, and O(n) time in
the worst case. The search works by hashing into the array, then sequentially
searching the array until the search value or a blank is found.
(Winter 1999) If the worst-case asymptotic behavior for hash search is O(n), why do we bother?
Using hasing can still decrease the number of data accesses required, and a good implementation can tend to display the O(1) behavior more often than the O(n) behavior.
What is a minimal spanning tree?
Given a graph with a set of nodes and a set of weighted edges, the minimal spanning tree is the set of edges such that all the edges touch every node in the graph and for which the sum of the weights is a minimum.
Give an example of an algorithm that computes such a tree.
Prim's algorithm starts with a set of one node in the graph, and repeatedly chooses the edge intersecting the set of nodes with the lowest weight that doesn't cause a cycle. This process continues until all the nodes are intersected by the chosen edges.
Kruskal's algorithm works by repeatedly choosing the globally lowest weighted edge that does not cause a cycle, until all nodes are intersected by the edges chosen.
Prove that your algorithm does indeed find the minimum spanning tree.
Let's prove Kruskal's by contradiction. Let MSTopt be the optimal spanning tree, and MSTk be the spanning tree produced by Kruskal's algorithm. Order the edges in both spanning trees by edge weight. Then compare them starting with the edge of least weight. Since the two graphs are different, eventually we will find two edges that are not the same.
Suppose that the weight of the edge from MSTopt is less than the weight of the edge from MSTk. In this case, we reach a contradiction because until this point the spanning trees were the same, and Kruskal's would not have chosen an edge of greater weight when the other edge of less weight was available.
Now consider the situation in which the weight of the edge from MSTopt is greater than the weight of the edge from MSTk. We know that the edge from MSTk does not cause a cycle in the edges that we've already looked at, since MSTk is a tree. If we were to add that edge to MSTopt then we know we would cause a cycle containing at least one edge has a weight greater than the the added edge. Removing that edge creates an MST will less total weight than the optimal, which is another contradiction.
Since both possible cases result in contradiction, our original assumption that Kruskal's algorithm does not create the MST must be in error.
Prove that an MST contains the edge of least weight.
Assume that an MST does not contain the edge of least weight, and try to reach a contradiction. If we were to add the edge of least weight to our graph, then a cycle would be formed in which all of the other edges have greater weight. We may then choose one of the other edges in the cycle and remove it, thereby creating another tree that includes the edge of lowest weight. Since we increased the total weight of the tree by less than we decreased it, the new tree has lower weight than the previous one, which therefore could not have been an MST. This contradicts our original assumption that there exists an MST that does not contain the edge of least weight.
Explain the shortest paths algorithm.
The shortest paths algorithm is a search in which an edge (that doesn't cause a cycle) is chosen each iteration such that the distance from the starting node to the newest node is minimized. The set of edges to choose from at any given time are those edges from nodes that have already been visited to nodes that have not been visited. The algorithm stops when all the nodes have been reached, and it's run time is O(V2).
What is a red-black tree? A B-tree?
A red-black tree is a tree that is guaranteed that no path is more than twice as long as another. By adding an extra bit of storage for the "color" and constraining the way coloring is done, the tree can be kept approximately balanced. As a result, there is a guaranteed access time of O(lg n).
A B-tree is a balanced search tree used with magnetic disks, designed to minimize disk I/O operations. B-trees differ from red-black trees in that the branching factor for the nodes can be larger than two.
What is dynamic programming?
In dynamic programming usually applies to optimization problems in which a set of choices must be made to arrive at the optimal solution. Each choice results in a subproblem that is very similar to the previous one, and the same subproblems occur frequently. Dynamic programming works because the solutions to the reappearing subproblems are "memorized" and used in later occurrences.
What are greedy algorithms?
Like dynamic programming, greedy algorithms are applied to optimization problems for which a set of choices must be made to arrive at the optimal solution. Greedy algorithms make the assumption that a locally optimal decision will lead to a globally optimal solution. It is sometimes unclear whether this heuristic is valid, but when it is, the solution is often found more quickly than in other methods.
What is the Master method?
It is a means of finding the asymptotic complexity of a recurrence relation.
Last changed January 30, 1997. David Coppit, david@coppit.org