CS651: Internet Algorithms
Nina Mishra


Course Overview

The Internet is arguably one of the most important inventions of the last few decades and we are only at the early stages of understanding what we created.   In this course, we discuss algorithms that have helped us harness and comprehend this large and highly dynamic entity.  The course begins by relating phenomena that we have empirically observed (e.g., small-world, power-law distribution) and also describes our efforts to model its behavior.   We then turn to algorithms for searching the Internet including page rank, hubs and authorities, and more recent topic-sensitive variants.  Algorithms for spreading epidemics and understanding the temporal evolution of web structure and web behavior are then covered.  Next, we analyze algorithms for discovering communities in the web graph.  Finally, the streaming nature of the data produced on the Internet calls for a shift in algorithmic thinking:  Running time is no longer the dominant issue; more important is how do we compute statistics over streams when the stream is too large to store?

 

This course should be of interest to graduate students in computer science and other related disciplines and is especially designed for those with an interest in understanding why algorithms work.  Familiarity with basic material in algorithms and probability at the level of the core undergraduate courses is useful.

 


Time and Location

Tu/Th:  3:30-4:45pm
Room:
THN D115

Office Hours

Th:  4:45-5:45pm, 226B Olsson Hall

 


 

Grading

Each student is responsible for submitting 1-page critiques for 3 papers related to the material covered in lectures (each 10%).  Grading for the class is primarily based on a class project including a project proposal (10%), progress report (10%), presentation (25%) and final report (25%).

Presentation Schedule


 

Overview of Lectures (Tentative)

 

Date

Topics

Notes

Aug 24, 2006

Introduction/Overview of Class

 

Aug 29/31, 2006

Small World

 

Sep 5/7, 2006

Power Law

9/5: Select 1st paper

Sep 12/14, 2006

Random Graph Models

9/12: 1st critique Due, 9/14: Select 2nd paper

Sep 19/21, 2006

Bursty Dynamics

9/19: Guest Lecture Mike Kobyakov, 9/21: 2nd Critique Due

Sep 26/28, 2006

Page rank, Hubs & Authorities

9/26: Select 3rd paper

Oct 3/5, 2006

Rank Aggregation

10/3: 3rd Critique Due

Oct 10/12, 2006

Human Computation

10/12: Project Proposal Due, ≤5 pages

Oct 17/19, 2006

Spread of Influence

 

Oct 24/26, 2006

Rumor Spreading, Sampling the Web

10/24: Guest Lecture Mike Kobyakov

Oct 31/Nov 2, 2006

Community Discovery, Graph Clustering

10/31: Progress Report Due, ≤7 pages

Nov 7/9, 2006

Web Streams: Distinct Values, Frequency Moments

Nov 14/16, 2006

Web Streams: Histograms/Project Presentations

Nov 21/23, 2006

Thanksgiving

 

Nov 28/30, 2006

Project Presentations

Dec 5, 2006

Project Presentations

 12/11: Final Report Due

 


 

Reading List

Small-World

Power-Law Distributions

Search

Rank Aggregation

Finding Influential Nodes based on Cascade Models

Epidemic Algorithms

Sampling and the Web

Bursty Dynamics

  • J. Kleinberg. Temporal Dynamics of On-Line Information Streams. Draft chapter for the forthcoming book Data Stream Management: Processing High-Speed Data Streams (M. Garofalakis, J. Gehrke, R. Rastogi, eds.), Springer.
  • J. Kleinberg. Bursty and Hierarchical Structure in Streams. Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002.
  • J. Aizen, D. Huttenlocher, J. Kleinberg, A. Novak. Traffic-Based Feedback on the Web. Proceedings of the National Academy of Sciences 101(Suppl.1):5254-5260, 2004.
  • R. Kumar, J. Novak, P. Raghavan, A. Tomkins. On the bursty evolution of Blogspace. Proc. International WWW Conference, 2003.
  • Y. Zhu and D. Shasha. Efficient Elastic Burst Detection in Data Streams. Proc. ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
  • E. Gabrilovich, S. Dumais, E. Horvitz. NewsJunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty. Proceedings of the Thirteenth International World Wide Web Conference. May 2004.
  • M. Vlachos, C. Meek, Z. Vagena, D. Gunopulos. Identifying Similarities, Periodicities and Bursts for Online Search Queries. Proc. ACM SIGMOD International Conference on Management of Data, 2004.
  • A..-L. Barabasi. The origin of bursts and heavy tails in human dynamics. Nature 435, 207-211 (2005).

Evolution of the Web

Community Discovery/Graph Clustering

Web Streams

Distributed Hashing/P2P Networks

Nearest Neighbors

Random Projections

 


Data Sets

  • Collaboration and citation networks: 2003 KDD Cup competition, networks of co-authorships and citations among a large community of physicists.
  •  
  • Internet topology: The network structure of the Internet can be studied at several levels of resolution. Here is a dataset at the autonomous system (AS) level.
  •  
  • Web subgraphs: 
  •  
  • Semantic networks: Free association datasets for words have been collected by cognitive scientists; these are constructed by compiling the free responses of test subjects when presented with cue words. (For example, a test subject presented with the cue word `ice' might react with the word `cold,' `cream,' or `water.')