University of Virginia Computer Science CS150: Computer Science, Fall 2005

# CS150 Condensed

This document summarizes the most important things I hope you have learned in CS150.

## How to Describe Procedures

Language: 1: Introduction [PPT (S), PDF (H), Notes (N)]; 2: Formal Systems and Languages [SHN]; 3: Rules of Evaluation [SHN]; PS3

Computer Science is the study of imperative ("how to") knowledge. Computer Science studies how to describe procedures and how to reason about the processes procedures produce. Ada, Countess of Lovelace, was the first Computer Scientist, because she was (probably) the first person to consider how to precisely describe proccedures.

Computer Science is not a science, since it is not about understanding nature. It is not engineering, since computer scientists do not face the kinds of constraints engineers face. Computer science is best considered a liberal art. It encompasses all seven of the traditional liberal arts: the language trivium --- grammar, rhetoric, logic; and the numbers quadrivium --- arithmetic, geometry, music and astronomy.

A formal system is a set of symbols and a set of rules for manipulating symbols. A language is like a formal system, except there is a mapping between sequences of symbols in the language and meanings.

Languages are powerful tools for description. Languages are made of primitives (the smallest units of meaning), means of combination (ways to combine language forms to make new ones), and means of abstraction (ways to give new names to language forms). Because language are recursive, we can express infinitely many different meanings starting with a finite number of primitives and means of combination.

We can describe the surface forms in a language using a replacement grammar (Backus Naur Form). Rules in BNF are of the form nonterminal ::= replacement and mean that whereve the nonterminal on the left side of the rule appears, it can be replaces with the right side of the rule. A simple BNF grammar can be used to describe a language with infinitely many surface forms, since nonterminals may appear in the replacement part of grammar rules. Another way to describe the surface forms of a language is to use a Recursive Transition Network. There is a mechanical process for converting between BNF and RTN descriptions of a language — all languages that can be described by RTNs can be described by BNFs, and vice versa.

We can describe the meaning of a language using rules of evaluation. Simple rules of evaluation for the language Scheme provide a mechanical way to determine the value of any Scheme expression that has a value (not all Scheme expressions have values).

You should be able to: explain what Computer Science is to a liberal arts student; be able to identify the primitives, means of combination, and means of abstraction for a language; describe a language using BNF or RTN; determine the set of the surface forms in a language described by a BNF grammar or RTN; determine what a surface form in a language means if you are given evaluation rules for the language; determine the value of a Scheme expression following the rules of evaluation.

Programming with Procedures: 3: Rules of Evaluation [SHN]; 19: Environments [SHN]; PS1-8
A procedure is a precise description of a process. Procedural abstraction allows us to use a single description of a procedure to describe many different information processes. A procedure can take parameters (inputs) and produce outputs. In Scheme, we can use lambda to make a procedure, and can pass around and manipulate procedures in powerful ways.

You should be able to: define and understand procedures; understand the substitution and environment models of evaluation; define and understand procedures that take procedures as parameters; define and understand procedures that produce procedures as results.

Programming with Data: 4: Programming with Data; [SHN]; 12: Quickest Sorting [SHN]; 32: Making Numbers [SHN]
To express computations clearly, we need ways to represent complex data. With only procedures, we can create a pair. With only pairs, we can create complex data sctructures. A list is either a special value known as null, or a pair where the second part is a list. By defining lists recursively this way, we can make lists of any length just by putting more pairs together. For many problems, lists are not a good representation. For example, by using trees we can more easily express faster sorting algorithms.

You should be able to: create and manipulate complex data structure starting from pair primitives, define procedures that manipulate lists, use and understand procedures that traverse lists and trees.

Recursive Definitions: 5: Recursing on Lists;[SHN],  6: List Recursion [N];  7: Recursion Practice [N]; 8: Recursing Recursively [SHN]

We can define procedures in terms of themselves. A recursive definition has a base case that solves the simplest version of the problem directly, and a recursive case that divides the problem into a two problems (one of which is a simpler version of the problem) that can be combined to solve the original problem. Recursion is everywhere: language, music, nature, etc. When we define recursive procedures on recursive data, we often have a base case corresponding to the base case for the recursive data structure. For lists, the base case deals with the null list, and we make progress by cdr-ing down the list.

You should be able to: understand a recursive definition; solve a problem by defining a procedure recursively; reason about the process produced by evaluating an application of a recursively defined procedure;

Programming with Mutation: 18: Mutation; 19: Environments [SN]

Mutation changes the value associated with a place. We can express all computation that use mutation without using mutation, but mutation is useful for describing some computations more clearly and producing more efficient computations. To support mutation, we need to change the evaluation rules for Scheme to use environments instead of just substitution. This is because the value of a name may change.

A name is a place for storing a value. A frame is a collection of places. An environment is a pointer to a frame. All frames except the outermost (global) frame have a parent frame. All expressions are evaluated in an environment. To evaluate a name in an environment, look for a place matching the name in the frame pointed to by the environment. If there is one, the value of the name is the value in that place. If there is not, evaluate the name in the parent environment.

Application creates a new frame, containing places named after the applied procedure's parameters. The parent of the new frame is the environment of the applied procedure.

You should be able to: define and understand procedures that use mutation; draw environment diagrams; explain what environment diagrams mean; understand how introducing mutation effects reasoning about programs.

Programming with Objects: 20: Objects [SN]; 22: Inheritance [SN]
An object is produced by packaging state and procedures. Programming with objects is called object-oriented programming. We call the procedures that are part of an object methods, and the state instance variables. We can program with objects by sending messages to objects that invoke methods. A class is a procedure that defines an object. Inheritance is defining a class in terms of another class. The subclass is the new class; its superclass is the class it uses. A subclass can add methods or redefine methods of the superclass; if a method not defined by the subclass is requested, it sends the request up to its superclass.

You should be able to: define procedures that create objects; explain a class hierarchy; define procedures that use inheritance; explain how a method is selected given class definitions.

Programming for the Internet: 29: Networking [SN]; 30: Internet [SN]; 31: Secure Websites [SN]; 36: Public-Key Cryptography [SN]; 37: Distributed Computing [SN]; 38: Google [SN];
A network is a group of three or more communicating entities. Networks have been around for thousands of years. The latency of a network measures how long it takes a message to travel between two points in the network; we can improve latency by reducing the number of transfer points (routers) between two points, reducing the time it takes to get through a transfer point, or increasing the speed the message travels between transfer points. The bandwidth of a network measures the amount of information the network can transmit per unit time; we can improve bandwidth by transmitting faster, transmitting more data at the same time, or encoding information more efficiently. Networks can use circuit switching, which reserves a whole path through the network for a transmission; or packet switching, which uses links one at a time. Circuit switching provides more reliable latency and bandwidth since once a path is reserved, it is available for the whole transmission. Packet switching uses network resources more efficiently.

An internetwork is a collection of multiple networks that can send messages between nodes in different networks. Many people (including Al Gore) contributed to the Internet, which is an internetwork that grew out of an ARPA project that started in 1969.

The World Wide Web established a common language for clients (browsers) and servers on the Internet. Clients interact with servers by sending HTTP requests and (mostly) getting back responses in HTML. A database is a way of storing and retrieving information. SQL is a language for manipulating databases.

Making web sites secure is difficult, and it is impossible to make any practical service perfectly secure. Some things we can do to improve security are to make sure passwords are never stored in cleartext by storing the hash instead, making cookies that cannot be reused or created without knowing a secret kept by the server, and using encryption to obscure data transmissions.

Large computations can be done by distributing them across multiple processes. Distributing a computation is difficult if it cannot be easily divided into tasks that can be done independently. When distributed tasks involved shared data, race conditions and deadlocks can occur.

You should be able to: measure the latency and bandwidth of a network; explain the advantages and disadvantages of packet switching; make a dynamic web site; manage passwords in a database in a somewhat secure way; construct a SQL command to select or insert data in a database table; explain how a computation can be distributed and evaluate which computations are easy or difficult to distribute.

## How to Reason About Processes

Measuring Work: 9: Sorting [S, N]; 10: Quicker Sorting [SN]; 11: Goalden Ages [SN]; 12: Quickest Sorting [SN]; 38: Google [SN]

Computer scientists measure work using orders of growth — since computers tend to get faster exponentially, it is usually more important to know how the number of steps required to solve a problem grows as the problem size grows, then to know the absolute time.

A problem is described by its inputs and outputs, and the relationship between the outputs and inputs. A solution to a problem is a procedure that given any possible inputs can calculate an output that satisfies the required relationship in a finite amount of time.

An upper bound, O(f(n)), on the amount of work requied to solve a problem means that we know how to solve in it with at most a constant multiple of f(n) steps. A lower bound, Ω(f(n)), on the amount of work required to solve a problem means that we can show it is impossible to define a procedure that solves it with less than a constant multiple of f(n) work. If we know the upper bound and lower bound for a problem are the same, we have a tight bound, Θ(f(n)).

Sorting is a problem that takes as input a list and a comparison function, and produces as output a list containing the same elements as the input list ordered by the comparison function. Bubblesort is a procedure for sorting that divides the sorting problem recursively into putting the first element in the right place in the result of sorting the rest of the list. Bubblesort is Θ(n2) where n is the number of elements in the list. A more efficient sorting procedure is quicksort — instead of dividing the problem of sorting a list of length n into pieces of length 1 and length n - 1, quicksort divides the problem into pieces that are likely to be around the same size. On average, quicksort is Θ(n log2 n).

You should be able to: describe problems precisely in terms of their inputs and outputs; express the amount of work a procedure requires using Θ notation; be able to reason about the time complexity of different sorting algorithms.

Complexity Classes: 13: Problems [SN]; 14: Intractable Problems [SN]; 15: P vs. NP [SN]; 16: NP-Completeness [SN]; 17: Growth [SN]; 35: DNA [SN]; PS3, PS4, PS5
If there is a procedure that solves a problem with O (nk) work, the problem can be solved in polynomial time and is in class P. There are some problems that can be solved in polynomial time if we could try all possible solutions at once. This class of problems is known as non-determinstic polynomial time (NP). We can show a problem is in NP by showing we can enumerate all possible solutions and we can check if a possible solution is correct in polynomial time. The hardest problems in NP are known as NP-Complete. It is not known whether or not there is a polynomial time solution for any of these problems, but if one is found for any NP-Complete problem, then all the other problems in NP can also be solved in P. Examples of NP-Complete problems include the smiley puzzle, the satisfiability problem, the travelling salesman problem, and the graph coloring problem. For all of these problems, there is a straightforward way to try all possible solutions, but no known way to solve them without trying (nearly) all possible solutions.

You should be able to: describe problems precisely in terms of their inputs and outputs; describe a problem using O, Ω, and Θ; estimate the amount of work a solution to a problem involves; classify problems into complexity classes P and NP; explain convincingly what a problem is in NP; explain what it would mean if someone developed a fast (polynomial time) procedure for an NP-Complete problem.

Computability: 23: Gödel's Theorem [ SN]; 24: Computability [SN]; 25: Undecidable Problems [SN]; PS6

An axiomatic system is a set of axioms and mechanical rules for deriving theorems starting from those axioms. A perfect axiomatic system for a domain (such as number theory) would produce all true theorems about that domain and no false theorems. An incomplete axiomatic system fails to produce some true theorems. An inconsistent axiomatic system produces some false theorems.

Gödel proved that it is impossible to produce a perfect axiomatic system for any interesting domain — the system must be either incomplete or inconsistent. This is shown by showing that you can express the statement G: this statement does not have any proof in the system using the system, which leads to a contradiction.

Some problems can be solved by algorithms (procedures that eventually terminate), others cannot. We call problems for which there is no algorithmic solution undecidable. An example of an undecidable problem is the Halting Problem. We can prove there is no algorithm that solves the Halting Problem, by showing if we had one it would lead to a contradiction. One way to show a problem is undecidable is to show that if we had a procedure that solves it, we could also solve the Halting Problem.

You should be able to: explain what it means for an axiomatic system to be perfect, incomplete or inconsistent; explain the essence of Gödel's proof; determine if a problem is decidable or undecidable, and provide a convincing (informal) argument why.

Models of Computation: 26: Modeling Computation [SN]; 27: Universal Turing Machines [SN]; 28: The Meaning of Truth [SN]; 32: Making Numbers [SN]; 33: Quantum Computing [SN]; 34: Computing with Life [SN];
For complexity classes to make sense, we need a model for what a step is. A model of computation must model input, output, processing and memory. A mechanical model of computation is a Turing Machine, which models input and output with an infinite tape, processing with a finite state machine that can read and write symbols on the tape and move the tape head, and memory with the state of the finite state machine and the contents of the tape. A Universal Turing Machine is a Turing Machine that takes as input the description of another Turing Machine and its input, and produces as output the result of running that Turing Machine on the input. Any mechanical computation can be performed by a Turing Machine.

Another way of modeling computation is symbolically using Lambda Calculus. A simple grammar, and two reduction rules are sufficient to model any computation. A Lambda Calculus term is in normal form if there are no places where Beta-reduction can be performed. We simulate computation using Lambda Calculus, by performing reductions; the normal form of a Lambda Calculus term corresponds to its value. We can show Lambda Calculus is as powerful as a Turing Machine (and hence, can perform any mechanical computation), by showing how to simulate a Turing Machine using Lambda Calculus.

According to the Church-Turing thesis, all computers based on mechanical physics are deeply similar. Computing can also be done with quantum physics and biochemistry, among other things. Alternative computing models can change what is computable and what complexity classes mean.

You should be able to: explain how to model computation; understand a finite state machine description and explain what it does; understand a Turing Machine description and explain what it does; explain if a problem can be solved by a finite state machine (or why it cannot); define a Turing Machine that solves a simple problem; reduce a Lambda Calculus term to normal form; create and manipulate Lambda Calculus terms that represent true, false, if, numbers and lists; show that a computing model is (or is not) capable of modeling any mechanical computation.

"); print ( \$res[\$first] ) ; print (""); ?>