CS200: Problem Set 7: Building Web Communities

University of Virginia, Department of Computer Science
CS200: Computer Science, Spring 2004

Problem Set 7: Building Web Communities Out: 29 March 2004
Due: Wednesday, 7 April 2004

Collaboration Policy - Read Carefully
For this problem set, you may discuss the problems with anyone you want, but you must produce your own solution. It is fine to work with other students, as long as you build your own web service and understand everything you are doing. For the written questions, you should feel free to discuss the problems and your answers with others, but you must write up your own answers and understand everything you write in your answer.
As always, you are encouraged to discuss this assignment with other students in the class and ask and provide help in useful ways. You may consult any outside resources you wish including books, papers, web sites and people (except for CS200 problem sets and comments from previous years). If you use resources other than the class materials, indicate what you used along with your answer.
Purpose

Understand how dynamic web sites are constructed.
Gain confidence that you can learn new languages on your own.
Practice answering complexity questions.
Learn some useful languages for making dynamic web sites: HTML, PHP and SQL.
Build your own web community,

Background
In 1990, Tim Berners-Lee, wrote a program called WorlDwidEweb for editing hypertext (text with embedded links to other documents) and developed the HyperText Transfer Protocol (HTTP) for allowing clients (browsers) and servers to talk to each other, Universal Resource Locators (URLs) for naming objects, and the HyperText Markup Language (HTML) for describing hybertext documents. With these pieces in place, anyone could set up a web server and start publishing their own hypertext documents. The web grew exponentially throughout the 1990s, with the number of web sites increasing from a few hundred in 1991 to 48 million sites in March 2004 and many fortunes were won and lost.
The promise of the web is to allow anyone to communicate with the whole world (or at least the rich parts). Unfortunately, not everyone is able to take CS200 and learn enough about computing to make their own web sites. So, the goal of this assignment is for you to build a web site that your friends and family can use to create their own home pages, and to provide mechanisms for recruiting contributors to your website and analyzing the connections between them. If your site is successful, you should have the beginnings of your own web community. When your web site if finished, it will work like http://www.cs.virginia.edu/cs200/community/ (if you are lucky, I invited you to join in class; if not, one of your classmates may invite you to join sometime soon).
The three tools you will use to make your site are:

HTML (Hypertext Markup Language) — the language used to describe web pages Schemer's Guide to HTML
PHP (PHP: Hypertext Preprocessor) — a universal programming language that can be incorporated into web pages and is evaluated by the web server when a page is requested Schemer's Guide to PHP
SQL (Structured Query Language) — a language for manipulating and extracting infromation from a database Schemer's Guide to SQL

Download: Download ps7.zip to your machine and unzip it into your home public_html directory J:\public_html\name of your choice. Files that you put in your public_html directory are visible on the web using the URL
    http://www.people.virginia.edu/~your username/pathname.
For example, if you unzip ps7.zip into J:\public_html\phamilynet and your username is dee2b then someone would be able to visit your site using
    http://www.people.virginia.edu/~dee2b/phamilynet/index.html
    or http://www.people.virginia.edu/~dee2b/phamilynet/
(the index.html file is retrieved if no file name is given).
After you have unzipped ps7.zip try visiting your site using a web browser such as Internet Explorer or Netscape Navigator. You should see a welcome page that looks like this.
The ps7.zip contains many HTML and PHP files, that will be described later.

Databases
Because HTTP is a stateless protocol, all information that needs to persist between web requests must be stored somewhere. We use a database to store everything (except for user login information with is stored in a cookie, see below). The Schemer's Guide to Structured Query Language gives a brief introduction to the SQL language we will use to manipulate the database.
First, you need to create a database (follow the directions in the SQL Guide to create your own MySQL database). After creating your database, edit the opendb.php file. Change the values assigned to $userName, $password and $dbName to match those for the database you created.
Since we don't want just anyone to be able to put documents on your web site, you will need a table for managing users. Create a table in your database named users with fields for storing the name, email address, last name, first names (all names except the last name), URL, content, encrypted password and cookiecounter for each user (managing cookies and passwords is tricky and error prone. For now, you should just use the code we provide for this and we will explain more about why the way it does things is semi-secure, and more obvious ways are not in a later lecture).
You can create the users table by issuing this SQL command:
CREATE TABLE users (
   id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,
   user VARCHAR(80) UNIQUE,
   email VARCHAR(255) UNIQUE NOT NULL,
   lastname VARCHAR(255) NOT NULL,
   firstnames VARCHAR(255),
   URL VARCHAR(255),
   content TEXT,
   password CHAR(80),
   cookiecounter INT
)
Enter the above command in the text entry area under "Run SQL query/queries on database name" and then click Go. You can also create tables using the MySQL web interface. After you submit the command, you should see a page with Your SQL-query has been executed successfully, and then be able to click on users on the left side of the page to see the table you created.
Note that each field has a name and a type. The id field is a unique identifier for each user. Since it is labeled AUTO_INCREMENT, the database will give it a value automatically that is one more than the previous entry. The user and email fields have type VARCHAR(num) which means they are a string of up to num characters. They use the UNIQUE and NOT NULL modifiers to indicate that all table entries must have different user names and emails, and that every entry must have a value for these fields.
Now that you have created a table, insert an entry for yourself in the table. For example, I would do this by running the SQL query,
INSERT INTO users (user,  email, lastname, firstnames)
           VALUES ('evans', 'evans@cs.virginia.edu', 'Evans', 'David')
(Don't forget the quote (') marks.)
After running the insert command, you should be able to see one entry in your table. You can view the whole table by clocking on Browse. Note that the id field has been automatically assigned a value, but the password and cookiecounter field values are blank (NULL). The Browse link shows you the result of the query,
   SELECT * FROM users
which means to select all fields for all entries in the users table. Note that SQL's SELECT command is different in some ways from the table-select procedure you defined in Problem Set 5. See the SQL Guide for details on SELECT.

Question 1: For each question, provide the SQL command that performs the requested action and run your command on your database. Note that the commands modify the state of the database, so you need to do them in order.

Insert a user into your users table with user name alyssa, email aph@cs.virginia.edu, last name Hacker and firstnames Alyssa P..
Insert a user into your users table with user name ben, email bb@cs.virginia.edu, last name Bitdiddle and firstnames Ben.
Select the lastname of all users in your table. The response should be a table like this (of course, your result will be different because you put yourself in the table instead of me):

lastname

Evans

Hacker

Bitdiddle

Select the lastname and firstnames of all users in your table in alphabetical order by firstnames. The response should be a table like:

firstnames lastname

Alyssa P. Hacker

Ben Bitdlddle

David Evans

Select the email address of all users in your table with lastname matching Hacker. The response should be the table:

email

aph@cs.virginia.edu

Delete all entries from your table whose id does not equal the id for your entry. (Note that the MySQL interface will give you a confirmation on DELETE commands, since a mistake could remove all the records you want from the table. It is a good idea with DELETE commands to use a LIMIT n as part of the query to make sure only the right number of entries are deleted. For example, for this question you would do DELETE FROM users ... LIMIT 2 to ensure that no more than 2 entries are deleted.

At this point, your users table should contain one entry corresponding to yourself. Use Browse to check this is the case, and issue the necessary SQL commands to repair it if it is not.

Managing Users
When you created your user entry in the table, you did not provide a value for the password field. This is because we don't want to store actual passwords in the database. This would be dangerous since anyone who breaks into the database (or just steals the disk it is stored on and starts looking at bits on the disk) would be able to learn everyone's password. Even though you would be foolish to put anything highly confidential on this site, people often use the same password for security-critical and non-security critical websites, so it is important to never store passwords in cleartext.
So, instead of storing actual passwords in the database we will store encrypted passwords. There are some tricky issues in how to do this that we will discuss in a later lecture, but the basic idea is to store Encrypt(password) in the database, and then when a user logs in check that the value calculated by encrypting the entered password matches the stored password. To activate your account, you will need to reset the password. Reload your main page (http://www.people.virginia.edu/~your username/pathname) and click on the Reset Password link. This links to the HTML file reset-password.html. This file contains:

<title>Reset Password</title>


<h1>Reset Password</h1>

Enter your email address to resent your password.  
A new password will be sent to your email address.

<blockquote>
<form method="post" action="reset-password-process.php">
   Email: <input type="text" size="30" name="email"><br>
   <input type="submit" value="Reset Password">
</form>
</blockquote>


See the HTML Guide for an introduction to HTML.
The first, third and last lines of the file are Apache includes. They include the text from another file as though it were pasted in here. This is useful for giving pages in your site a standard appearance that can easily be changed.
The rest of the file displays the "Reset Password" title and a form for entering an email address. The action parameter of the form is the URL of the file that will receive the values entered in the form when the visitor clicks the submit button. In this case reset-password-process.php will receive the form input. The text entered in the email input will be visiable using the PHP variable $email (corresponding to the name parameter to the input).
Enter your email address in the form and click Reset Password. (Remember to change the values in opendb.php to match your database before doing this).
The reset-password-process.php code will received the input. Look at the code for reset-password-process.php and try to understand what it is doing. See the PHP Guide for an introduction to PHP.
Before sending queries, we must open the database. This is done by calling the openDatabase procedure defined in opendb.php. Then, the file uses:
  $query =  "SELECT user FROM users WHERE email='$email'";
  $result = mysql_query($query);
to find the user matching $email (the value entered in the form). The PHP procedure mysql_query sends a SQL command to the database. The response is stored in the value $result. After every SQL query, we should check that there was no error. If there was, the value mysql_errno () will be non-false and mysql_error () will provide information on the error. We do this using:
  if (mysql_errno ()) {
    error ("Select error for " . $query . ": " . mysql_error ());
  }
Next, we make sure there is a user matching the $email:
  if (mysql_num_rows($result) != 1) {
    error ("No user matching email: " . $email);
  } else {
    $user = mysql_result ($result, 0, 0);
  }
If there is, the value of mysql_result ($result, 0, 0) (the first field of the first entry in the result table) is assigned to the variable $user.
The next several lines generate a random password, and store the encrypted password in the database:
  $password = substr (md5 ($user  . rand (0, 100000)), 0, 8);
  $encryptedpass = md5 ($password . $user);   // We use the user as a salt
  $query =  "UPDATE users SET password='$encryptedpass' WHERE user='$user'";
  $result = mysql_query($query);
  
  if (mysql_errno ()) {
    error ("Update failed for " . $query . ": " . mysql_error ());
  }
(The md5 function is a one-way hash, which returns a random-looking string that is a function of the input string. It is called a one-way hash since it is hard given an output value to find an input that produces that output.)
If everything worked, we close the database (mysql_close();) and then send email to the $email address with the new password information.
You will want to change the constants in constants.php to match your site so the generated email is more sensible.
You should soon receive the email with new password. Try logging in now using the password received. You should then use the Change Password link to give yourself a more memorable password that wasn't exposed by sending it via email.
Note that our security model is similar to the one used by many commercial web sites. We assume (optimistically) that email is secure, and only the owner of an email address may read email sent to that address. This would not be a wise assumption for security critical sites, but it is good enough for this one.
If you look at the table now, you'll see that the password field for your entry now contains some random looking text (different from the actual password you entered.
Inviting More Members
To grow a community, we need a way of inviting new members. We want to keep track of connections between members, so we need a new table. Create the links table by issuing this SQL command:
CREATE TABLE links (
   fromid INT NOT NULL,
   toid INT NOT NULL
)
We use the fromid and toid to reference user ids in the users table.
To invite a new member, an existing member selects the Invite New Members link which links to invite.php. On that page, the user enters an email address into a form. The processing code for that form is invite-process.php. This inserts a new entry into the users table using,
      $actcode = substr (md5 ($email . rand (0, 100000)), 0, 6);
      executeQuery ("INSERT INTO users (email, password) VALUES ('$email', '$actcode')");
The $actcode is a randomly generated code that will be embedded into the email that the new user receives.
The invite-process.php code should also add an entry into the links table from the inviter to the invitee.

Question 2: Insert the code needed to add the new entry to the links table in invite-process.php (it should go where Question 2 is marked, and require no more than two statements).

Try inviting a few users into your community. (It would be a good idea to test it with some people working nearby, so you can see the generated email they receive.)
The Community Page
The file community-members.php should show all members of the community. (If your community gets so big you can't put everyone on a single web page, you'll need to figure out a better way to do this.)
The provided code includes the query,
    $result = executeQuery ("SELECT firstnames, lastname, id 
                             FROM users 
                             WHERE user IS NOT NULL
                             ORDER BY lastname");
the obtains the table entires of active users (that is, not including those where were invited, but have not yet responded) sorted in alphabetical order by last name.
We want to print out the names of all members in the community as links to their personal pages. The personal page for the user with id uid is personal-page.php?id=uid.

Question 3: Insert the code needed to display the community members as links to their personal pages in community-members.php (it should go where Question 3 is marked).

Question 4: How much work is your community-members.php code? (Express your answer using Θ notation, but make sure to clearly state your assumptions and what all variables you use mean.)

Djikstra's Algorithm
Once you invite a few people into your community, they will start inviting their friends into the community, and their friends may invite their friends.
A member of the community should be able to click on another member and find a sequence of links that connect the two members. Note that every two members of the community must be connected through some sequence of inviter-invitee or invitee-inviter links. This is the case since you started the community, and there must be an inviter-invitee chain that connects you to every member of the community, since the only way someone can join the community is if you invited them, or if someone you invited invited them, etc. Then, there is a link back down the tree to every other member using inviter-invitee links.
So, every member of the community should be able to find a chain of links between themself and any other member.
Edsger Dijkstra invented an algorithm for finding the shortest path between any two nodes in a graph. The file find-links-process.php uses Dijkstra's algorithm to find the shortest path between any two people in our community. If is the action file for the form on find-links.php which asks the visitor to select the from person (passed in as $fromid) and to person (passed in as $toid).
We will use two arrays to keep track of our work:

$dist — a mapping from user id's to the smallest known number of links between $fromid and this node. Initially, we set the value for each node to the longest possible path (which is the number of nodes in the graph).
$previous — a mapping from user id's to the user id immediately before this node on the best known path.
When we are done, we will be able to find the shorted path between $fromid and $toid by looking backwards from $fromid using the $previous array until we find $toid.
First, we initialize $nodes as a PHP array containing all the user ids in our community and the $dist and $previous arrays:
    $result = executeQuery ("SELECT id FROM users");
    $longest = mysql_num_rows ($result) + 1; // longer than longest path

    // Initialize $nodes to an array of the community members.
    // Initialize the distance to all nodes as the maximum value.
    // Initialize the previous node for every node as -1 (no known path).
    for ($i = 0; $i < mysql_num_rows ($result); $i++) {
      $node = mysql_result ($result, $i, 0);
      $nodes[$i] = $node;
      $dist[$node] = $longest;
      $previous[$node] = -1;
    }
We do know one best path: the path from $fromid to $fromid. It has distance 0:
    // The distance from the fromid to itself is 0
    $dist[$fromid] = 0;
The was the algorithm works is we keep track of a set of nodes, Q we still need to find the shortest path to. Initially, Q contains all nodes in the graph. At each iteration, we find the node u in that set with the smallest distance from $fromid (that is, the minimum $dist[u] value. We know there is no better path to u, so we remove it from the working set Q. For each node v with a direct link to u (in our case, either because the person corresponding to u invited v, or because v invited u), we check if the path through u is better than the best previously known path to v. If the new path is better (that is, the distance to u + 1 for the new link is less than the current value of $dist[$v]), we use the new path and set
   $dist[$v] = $dist[$u] + 1;
   $previous[$v] = $u;
to reflect the new best path we just found. We keep going until the working set Q is empty or we've found the best path to the $tonode (that is, it is the minimum distance node in Q).
Here's the code from find-links-process.php:
    // Q keeps track of the nodes we still need to work on
    $Q = $nodes;
    
    while (count ($Q) > 0) { // more elements in $Q
      // find the node in $Q with the minimum distance
      $unode = findmin ($Q);
      if ($unode == $toid) {
	break; // Can stop working if we've reached the toid
      }

      // Remove the minimum node
      $u = array_search ($unode, $Q);
      $Q[$u] = false;
      $Q = array_filter ($Q); // removes false elements

      // Find all nodes linked to u, and see if there are better paths
      $invitees = executeQuery ("SELECT toid FROM links WHERE fromid='$unode'");
      relaxNeighbors ($unode, $invitees);
      $inviters = executeQuery ("SELECT fromid FROM links WHERE toid='$unode'");
      relaxNeighbors ($unode, $inviters);
    }
The relaxNeighbors procedure takes a node $unode and a table of nodes linked to that node, and calls relax for $unode and each node in the table:
  function relaxNeighbors ($unode, $neighbors) {
    for ($i = 0; $i < mysql_num_rows ($neighbors); $i++) {
      $invitee = mysql_result ($neighbors, $i, 0);      
      relax ($unode, $invitee);
    }
  }
The relax procedure takes to nodes $u and $v and checks if the path to $v through $u is better than the previously best known path. If it is, it replaces the path with the new path through $u:
  function relax ($u, $v) {
    global $dist;
    global $previous;

    if ($dist[$v] > $dist[$u] + 1) {
      $dist[$v] = $dist[$u] + 1;
      $previous[$v] = $u;
    }
  }
Question 5: Assuming the main constraint on the size of the community is that evaluating find-links-process.php for any two members of the community cannot take more than 60 seconds, how large a community can our current implementation support?
To answer this question, you will need to make lots of assumptions. Make all the assumptions you need, but the more grounded your answer is in reality, the better. If you succeed in getting a lot of people to join your community, you may be able to do some useful experiments also (but please try not to crash UVa's web server!)

Note that sites like Friendster and Oracle of Bacon scale to millions of entries, and find links between them using Djikstra's algorithm. They do this by tweaking the code to make it run as fast as possible, but also by computing the shortest paths off-line and storing them in the database instead of computing them in response to every query. Note that Djikstra's algorithm doesn't just find the shortest path between two nodes — it finds the shortest path between the from node and every other node in the graph, so precomputing all shortest paths from one node is not much more work than finding one.

Adding Links
In the provided code, the only way two people are linked is when one invites the other. This means the social network we produce will be a tree, with yourself (the first inviter) at the root. Although sometimes people believe they are at the center of the universe (or the root of the tree), in real social networks this is rarely the case. There are often other links between people in the community.
The goal is to extend your system so any member can add an "acquaintance" link to any other member. Like friendship, acquaintanceship is not necessarily mutual, so the links are one directional. Adding acquaintances will involve:

Making a new web page with a form user's can use to add acquaintances
Adding a new table to your database to record the acquaintance links.
Modifying find-links-process.php to take acquaintance links into account.
You may take whatever approach you want to this, however, we suggest you follow the steps outlined in questions 6-8. If you have a different idea for extending your community net, we may be willing to accept this as an alternative to doing questions 6-8. You need to email or discuss your idea with me first to obtain permission.

Question 6: Create a file add-acquaintance.php and add a link from the user-page.php to it. Your add-acquaintance.php should be quite similar to the find-links.php since it will involve selecting a community member from a list of all members. In this case, the from person is the user who is logged in, though, so only the to member should be selected. So, you should start by copying the find-links.php code and figuring out what to change. One of the things you should change is the form action: replace find-links-process.php with add-acquaintance-process.php.

Question 7: Create a file add-acquaintance-process.php that implements the form processing. You will need to add a table to your database to keep track of acquaintance links, and then insert a new entry in that table corresponding to the new acquaintance link. The new table should be similar to the links table we created for inviter-invitee links, but we want to distinguish between inviter-invitee relationships and acquaintance relationships so either we need a new table, or we need to modify the old links table with a new field. You may use either approach (but adding a new table is probably slightly easier).
To implement add-acquaintance-process.php, you might consider starting from invite-process.php which is the most similar thing you already have (but make sure to remove the email invitation).

Question 8: Modify find-links-process.php to take acquaintance links into account. You will need to make two changes:

When the algorithm looks for nodes linked to $u, you will need to find links in the acquaintance table also. Note that acquaintances are one way, so you only need to consider acquaintances where $u is the from node.
When the algorithm displays the links at the end, you will need to display acquaintance links differently (as it is, they will show up as "invited by" links). Make sure to do this without breaking how actual "invited by" links are shown.

Site Submission
Submit your site by inviting evans@virginia.edu into your community.

Bonuses
There will a token prize for the student who builds the largest community (number of members). In order to attract more people to your community, you may want to add some extra features to your web site (for example, the ability to share pictures). Of course, anything you add must be in good taste and follow the University's policy on public computing resources).

Credits: This problem set was developed for CS200 Spring 2004 by David Evans. It is partially inspired by this assignment developed by Philip Greenspun for MIT's 6.916, the Oracle of Bacon web site developed by former UVa CS student Brett Tjaden (it was selected by Time magazine as one of the top-ten best Web sites for 1996) and various commercial services such as Friendster. Google tried to build a community network, Orkut, but had to shut it down because of security flaws.








	CS 200: Computer Science Department of Computer Science University of Virginia Fractal Logo by Lincoln Hamilton and Dan Nguyen	cs200-staff@cs.virginia.edu Using these Materials