CS 200 Computer Science from Ada and Euclid to Quantum Computing and the World Wide Web cs200@cs.virginia.edu Schedule - Problem Sets - Exams - Notes - Lectures - Links

## CS200 Condensed

This document summarizes the most important things I hope you have learned in CS200.

## How to Describe Procedures

Language: 1: Introduction [Slides, Notes]; 2: Formal Systems and Languages [SN]; 3: Rules of Evaluation [S, N] 5: Fibonacci [SN];

Computer Science is the study of imperative ("how to") knowledge. Computer Science studies how to describe procedures and how to reason about the processes procedures produce. Ada, Countess of Lovelace, was the first Computer Scientist, because she was (probably) the first person to consider how to precisely describe proccedures.

Computer Science is not a science, since it is not about understanding nature. It is not engineering, since computer scientists do not face the kinds of constraints engineers face. Computer science is best considered a liberal art. It encompasses all seven of the traditional liberal arts: the language trivium --- grammar, rhetoric, logic; and the numbers quadrivium --- arithmetic, geometry, music and astronomy.

A formal system is a set of symbols and a set of rules for manipulating symbols. A language is like a formal system, except there is a mapping between sequences of symbols in the language and meanings.

Languages are powerful tools for description. Languages are made of primitives (the smallest units of meaning), means of combination (ways to combine language forms to make new ones), and means of abstraction (ways to give new names to language forms). Because language are recursive, we can express infinitely many different meanings starting with a finite number of primitives and means of combination.

We can describe the surface forms in a language using a replacement grammar (Backus Naur Form). Rules in BNF are of the form nonterminal ::= replacement and mean that whereve the nonterminal on the left side of the rule appears, it can be replaces with the right side of the rule. A simple BNF grammar can be used to describe a language with infinitely many surface forms, since nonterminals may appear in the replacement part of grammar rules. Another way to describe the surface forms of a language is to use a Recursive Transition Network. There is a mechanical process for converting between BNF and RTN descriptions of a language — all languages that can be described by RTNs can be described by BNFs, and vice versa.

We can describe the meaning of a language using rules of evaluation. Simple rules of evaluation for the language Scheme, provide a mechanical way to determine the value of any Scheme expression that has a value (not all Scheme expressions have values).

You should be able to: explain what Computer Science is to a liberal arts student; be able to identify the primitives, means of combination, and means of abstraction for a language; describe a language using BNF or RTN; determine the set of the surface forms in a language described by a BNF grammar or RTN; determine what a surface form in a language means if you are given evaluation rules for the language; determine the value of a Scheme expression following the rules of evaluation.

Recursive Definitions: 4: Evaluation and Recursion [SN]; 5: Fibonacci [SN]; 6: Recursing Recursively [SN]; 7: Defining for [N]; 38: Fixed Points and Biological Computation [SN]

We can define procedures in terms of themselves. A recursive definition has a base case that solves the simplest version of the problem directly, and a recursive case that divides the problem into a two problems (one of which is a simpler version of the problem) that can be combined to solve the original problem. Recursion is everywhere: language, music, nature, etc.

You should be able to: understand a recursive definition; solve a problem by defining a procedure recursively; reason about the process produced by evaluating an application of a recursively defined procedure; define and understand procedures that take procedures as parameters; define and understand procedures that produces procedures as results.

Programming with Lists: 8: Cons car cdr [SN]; 9: Programming with Lists [SN]; 10: Barista Assista [N]

To express computations clearly, we need ways to represent complex data. We can construct a pair (using cons), and access its parts using car and cdr. A list is either a special value known as null, or a pair where the second part is a list. By defining lists recursively this way, we can make lists of any length just by putting more pairs together. A useful way of programming with lists, is to define a recursive procedure that has a base case for null, and otherwise does something with the car of the list, and recursively calls the procedure with the cdr of the list.

You should be able to: define procedures that manipulate lists, use and understand procedures that traverse lists (including map, filter and insertlg).

Programming with Mutation: 18: Mutation [SN]; 19: Environments [SN]; 20: Objects [SN]; 21: Inheritance [SN]

Mutation changes the value associated with a place. We can express all computation that use mutation without using mutation, but mutation is useful for describing some computations more clearly and producing more efficient computations. To support mutation, we need to change the evaluation rules for Scheme to use environments instead of just substitution. This is because the value of a name may change.

A name is a place for storing a value. A frame is a collection of places. An environment is a pointer to a frame. All frames except the outermost (global) frame have a parent frame. All expressions are evaluated in an environment. To evaluate a name in an environment, look for a place matching the name in the frame pointed to by the environment. If there is one, the value of the name is the value in that place. If there is not, evaluate the name in the parent environment.

Application creates a new frame, containing places named after the applied procedure's parameters. The parent of the new frame is the environment of the applied procedure.

An object is produced by packaging state and procedures. Programming with objects is called object-oriented programming. We call the procedures that are part of an object methods, and the state instance variables. We can program with objects by sending messages to objects that invoke methods. A class is a procedure that defines an object. Inheritance is defining a class in terms of another class. The subclass is the new class; its superclass is the class it uses. A subclass can add methods or redefine methods of the superclass; if a method not defined by the subclass is requested, it sends the request up to its superclass.

You should be able to: define and understand procedures that use mutation; draw environment diagrams; explain what environment diagrams mean; define procedures that create objects; explain a class hierarchy; define procedures that use inheritance; explain how a method is selected given class definitions.

Metalinguistic Abstraction: 25: Metalinguistics [SN]; 26: The Metacircular Evaluator [SN]; 27: Lazy Evaluation [SN]; 28: Types [SN]; 29: Type Checking [SN]; Exam 2

Languages change the way we think; sometimes the best way to solve a problem is to define a new language. To design a language we need to define the surface forms of the language (for example, using BNF) and describe the evaluation rules. To use the new language to solve a problem, we need to implement an evaluator for the language that carries out the evaluation rules. An evaluator is just another program.

The core of a Scheme evaluator is eval and apply, procedures that are defined in terms of each other. The eval procedure takes an expression and an environment and evaluates to the value of the expression in the environment; the apply procedure takes a procedure and its operands and evaluates to the value of applying the procedure to its operands.

By changing the evaluator, we can change the meaning of a language. In regular Scheme, the subexpressions of a combination are all evaluated before the value of the first subexpression is applied to the value of the others. We can change the evaluator to evaluate applications lazily instead, by only evaluating the value of an operand when it is needed. A language with lazy evaluation needs fewer special forms than a language with eager evaluation since programmers can control when operands are evaluated.

A type is a (possibly infinite) set of values. You can do different things with values of different types. Programming languages can have latent types (not visible in the surface form) or manifest types (explicit in the surface form). Types can be checked statically (before evaluating the program) or dynamically (as the program is evaluated). Scheme has latent types that are checked dynamically. To produce large, reliable programs it is useful to have manifest types that are checked statically. We can produce a statically-typed variant of Scheme by changing the evaluator to check types.

An abstract type is a type whose representation is hidden. An abstract type is checked by name, instead of by the structure of its representation type. To do useful things with abstract types, we need to turn them into their representation type. One way to do this is to extend our language to support special forms up and down that convert representation types to abstract types (and vice versa).

You should be able to: understand an evaluator; modify and evaluator to change the meaning of a language; explain why the difference between eagar and lazy evaluation matters; explain the advantages and disadvantages of static type checking; understand an evaluator that does static type checking; understand a program that uses abstract types, up and down; extend a language evaluator to support abstract types.

Web: 31: Networks, The Internet and the World Wide Web [SN]; 32: How to Build a Dynamic Web Site [SN]
A network is a group of three or more communicating entities. Networks have been around for thousands of years. The latency of a network measures how long it takes a message to travel between two points in the network; we can improve latency by reducing the number of transfer points (routers) between two points, reducing the time it takes to get through a transfer point, or increasing the speed the message travels between transfer points. The bandwidth of a network measures the amount of information the network can transmit per unit time; we can improve bandwidth by transmitting faster, transmitting more data at the same time, or encoding information more efficiently. Networks can use circuit switching, which reserves a whole path through the network for a transmission; or packet switching, which uses links one at a time. Circuit switching provides more reliable latency and bandwidth since once a path is reserved, it is available for the whole transmission. Packet switching uses network resources more efficiently.

An internetwork is a collection of multiple networks that can send messages between nodes in different networks. Many people (including Al Gore) contributed to the Internet, which is an internetwork that grew out of an ARPA project that started in 1969.

The World Wide Web established a common language for clients (browsers) and servers on the Internet. Clients interact with servers by sending HTTP requests and (mostly) getting back responses in HTML. A database is a way of storing and retrieving information. SQL is a language for manipulating databases.

You should be able to: measure the latency and bandwidth of a network; explain the advantages and disadvantages of packet switching; make a dynamic web site; construct a SQL command to select or insert data in a database table; learn a new language given a BNF grammar and an informal description of its evaluation rules.

## How to Reason About Processes

Measuring Complexity: 11: Sorting [S, N]; 12: Quicksorting [SN]; 13: Astrophysics and Cryptology [SN]; 14: P = NP? [SN]; 15: Intractable Problems [SN]; 16: Knapsack Problem (Exam Review) [N]

Computer scientists measure work using orders of growth — since computers tend to get faster exponentially, it is usually more important to know how the number of steps required to solve a problem grows as the problem size grows, then to know the absolute time.

A problem is described by its inputs and outputs, and the relationship between the outputs and inputs. A solution to a problem is a procedure that given any possible inputs can calculate an output that satisfies the required relationship in a finite amount of time.

An upper bound, O(f(n)), on the amount of work requied to solve a problem means that we know how to solve in it with at most a constant multiple of f(n) steps. A lower bound, Ω(f(n)), on the amount of work required to solve a problem means that we can show it is impossible to define a procedure that solves it with less than a constant multiple of f(n) work. If we know the upper bound and lower bound for a problem are the same, we have a tight bound, Θ(f(n)).

Sorting is a problem that takes as input a list and a comparison function, and produces as output a list containing the same elements as the input list ordered by the comparison function. Bubblesort is a procedure for sorting that divides the sorting problem recursively into putting the first element in the right place in the result of sorting the rest of the list. Bubblesort is Θ(n2) where n is the number of elements in the list. A more efficient sorting procedure is quicksort — instead of dividing the problem of sorting a list of length n into pieces of length 1 and length n - 1, quicksort divides the problem into pieces that are likely to be around the same size. On average, quicksort is Θ(n log2 n).

If there is a procedure that solves a problem with O (nk) work, the problem can be solved in polynomial time and is in class P. There are some problems that can be solved in polynomial time if we could try all possible solutions at once. This class of problems is known as non-determinstic polynomial time (NP). We can show a problem is in NP by showing there are at most an exponential (that is kn) number of possible solutions to try, and we can check if a possible solution is correct in polynomial time. The hardest problems in NP are known as NP-Complete. It is not known whether or not there is a polynomial time solution for any of these problems, but if one is found for any NP-Complete problem, then all the other problems in NP can also be solved in P. Examples of NP-Complete problems include the smiley puzzle, the satisfiability problem, the travelling salesman problem, and the graph coloring problem. For all of these problems, there is a straightforward way to try all possible solutions, but no known way to solve them without trying (nearly) all possible solutions.

Problems that do not have solutions in P are intractable — the amount of work required to solve them scales so quickly it is unreasonable to solve even medium size versions of the problem. To solve these problems, we need to settle for approximations.

You should be able to: describe problems precisely in terms of their inputs and outputs; express the amount of work a procedure requires using Θ notation; know what it means to describe a problem using O, Ω, and Θ; estimate the amount of work a solution to a problem involves; classify problems into complexity classes P and NP; explain convincingly what a problem is in NP; explain what it would mean if someone developed a fast (polynomial time) procedure for an NP-Complete problem.

Computability: 22: Gödel's Theorem [ SN]; 23: Computability [SN]; 24: Classification Practice [N]

An axiomatic system is a set of axioms and mechanical rules for deriving theorems starting from those axioms. A perfect axiomatic system for a domain (such as number theory) would produce all true theorems about that domain and no false theorems. An incomplete axiomatic system fails to produce some true theorems. An inconsistent axiomatic system produces some false theorems.

Gödel proved that it is impossible to produce a perfect axiomatic system for any interesting domain — the system must be either incomplete or inconsistent. This is shown by showing that you can express the statement G: this statement does not have any proof in the system using the system, which leads to a contradiction.

Some problems can be solved by algorithms (procedures that eventually terminate), others cannot. We call problems for which there is no algorithmic solution undecidable. An example of an undecidable problem is the Halting Problem. We can prove there is no algorithm that solves the Halting Problem, by showing if we had one it would lead to a contradiction. One way to show a problem is undecidable is to show that if we had a procedure that solves it, we could also solve the Halting Problem.

You should be able to: explain what it means for an axiomatic system to be perfect, incomplete or inconsistent; explain the essence of Gödel's proof; determine if a problem is decidable or undecidable, and provide a convincing (informal) argument why.

Models of Computation: 34: Modeling Computation [SN]; 35: Lambda Calculus [SN]; 36: The Meaning of Truth [SN]; 37: Making Numbers and Lists from Glue Alone [SN]; 38: Fixed Points and Biological Computation [SN]
For complexity classes to make sense, we need a model for what a step is. A model of computation must model input, output, processing and memory. A mechanical model of computation is a Turing Machine, which models input and output with an infinite tape, processing with a finite state machine that can read and write symbols on the tape and move the tape head, and memory with the state of the finite state machine and the contents of the tape. A Universal Turing Machine is a Turing Machine that takes as input the description of another Turing Machine and its input, and produces as output the result of running that Turing Machine on the input. Any mechanical computation can be performed by a Turing Machine.

Another way of modeling computation is symbolically using Lambda Calculus. A simple grammar, and two reduction rules are sufficient to model any computation. A Lambda Calculus term is in normal form if there are no places where Beta-reduction can be performed. We simulate computation using Lambda Calculus, by performing reductions; the normal form of a Lambda Calculus term corresponds to its value. We can show Lambda Calculus is as powerful as a Turing Machine (and hence, can perform any mechanical computation), by showing how to simulate a Turing Machine using Lambda Calculus.

You should be able to: explain how to model computation; understand a finite state machine description and explain what it does; understand a Turing Machine description and explain what it does; explain if a problem can be solved by a finite state machine (or why it cannot); define a Turing Machine that solves a simple problem; reduce a Lambda Calculus term to normal form; create and manipulate Lambda Calculus terms that represent true, false, if, numbers and lists; show that a computing model is (or is not) capable of modeling any mechanical computation.