Class 39: Slides [PPTX]
Sir Tim Berners-Lee, Information Management: A Proposal. 1989.
Sergey Brin and Lawrence Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Our main goal is to improve the quality of web search engines. In 1994, some people believed that a complete search index would make it possible to find anything easily. According to Best of the Web 1994 — Navigators, “The best navigation service should make it easy to find almost anything on the Web (once all the data is entered).” However, the Web of 1997 is quite different. Anyone who has used a search engine recently, can readily testify that the completeness of the index is not the only factor in the quality of search results. “Junk results” often wash out any results that a user is interested in. In fact, as of November 1997, only one of the top four commercial search engines finds itself (returns its own search page in response to its name in the top ten results). One of the main causes of this problem is that the number of documents in the indices has been increasing by many orders of magnitude, but the user’s ability to look at documents has not. People are still only willing to look at the first few tens of results. Because of this, as the collection size grows, we need tools that have very high precision (number of relevant documents returned, say in the top tens of results). Indeed, we want our notion of “relevant” to only include the very best documents since there may be tens of thousands of slightly relevant documents.