Techniques for Web Latency Improvement
Nicolas Christin, Aziz Ibrahim, and Alfred C. Weaver
Department of Computer Science, University of Virginia
Charlottesville, Virginia 22903 U.S.A.
{christin|aziz|weaver@cs.virginia.edu}
Extended Abstract
Since the invention of the WWW, users have complained about the "world wide wait." In this paper we examine the sources of these delays and the techniques now being used or proposed to combat web latency: modifications of the transport protocol, persistent connections, pipelining, compression, and caching. We explore the applicability of each technique and report measured results and lessons learned. We conclude by identifying those techniques which appear to be most suitable, and we identify the context in which they should be employed and the degree of improvement (latency reduction) which they produce.
1. Evolution of HTTP
HTTP was originally designed for simplicity rather than efficiency. HTTP/0.9 provided only a request/response paradigm, opening a separate connection for each request. The transport protocol was TCP, chosen because of its ubiquity. HTTP/1.0 [3] provided functional improvements (support for MIME-like messages and applications other than plain text), but its performance was not affected.
HTTP/1.1 [9,10] focuses on performance and security, and is thus significantly more complex than its predecessors. This version provides direct support (using tokens) for three different compression techniques: gzip, lkw, and zlib. It can work with persistent connections and applications can use pipelining. It provides significant enhancements for caching, supporting both expiration and validation of cached content and allowing the origin server to specify the expiration times as well as the freshness levels for all data it serves to the cache. Both client and server can operate in a way that makes the proxy cache either transparent or non-transparent, allowing for maximum flexibility.
2. Transport Protocol and Connection Management
HTTP’s choice of its underlying transport protocol, TCP, raises some issues regarding latency. Several inefficiencies are due in part to the robustness of TCP, the most remarkable being the three-way handshake needed to open a connection, which becomes even less efficient given the way that HTTP itself uses TCP. In this section we examine some improvements, extensions, and replacements that have been suggested to make HTTP requests more efficient.
2.1 Modifications of the transport protocol
Improved flow control. One inefficiency of TCP is that its flow control mechanisms are not well suited for short-lived connections. The slow-start algorithm [15] can be a bottleneck, in the sense that the capabilities of the network can not be optimized because the connection window will not have time to reach its optimum size. Two approaches have been suggested: use persistent connections, or disable slow-start.
HTTP/1.1 requires the former approach, but Visweswaraiah and Heidemann [30] point out that this is still not a solution due to the "slow-start restart" problem: after having been idle, a TCP connection should reinitialize the slow-start algorithm to avoid queue overruns at intermediate routers. They propose sending segments at a certain rate until the ACK clock is running again, with the rate being computed as a fraction of previous data rates; therefore the technique is called Rate Based Pacing (RBP).
Caching connection information. Given that TCP can be implemented as a finite state machine, a substantial amount of information has to be kept in memory in the case of multiple parallel connections (the approach of HTTP/1.0). Two ideas [4,28] have thus been proposed for reusing TCBs (TCP Control Blocks) across connections.
Transaction TCP (T/TCP) proposes in [4] to bypass the three-way handshake and replace it with the TCP Accelerated Open (TAO), which uses cached information to avoid one acknowledgement step; it also avoids the limit of 268 transactions per second. Shared TCB [28] goes one step further: while T/TCP uses the TCB of a closed connection to initialize parameters of a new connection to the same host, Shared TCB proposes to use the TCBs of current connections.
UDP-based transport. TCP could be completely replaced with a protocol which would benefit from the simplicity of UDP, yet provide some minimum reliability; an example of such a protocol is ARDP [20]. It borrows some ideas from TCP, such as flow control, congestion avoidance, and retransmission algorithm, but avoids the cost of TCP when setting up a connection. The approach seems superficially similar to T/TCP, but operates differently: instead of caching information, connection identifiers are selected at random.
Xpress Transport Protocol. XTP [27] was designed for fast (and optionally reliable) unicast and multicast transmission, all in a single protocol. XTP sends data in the connection setup packet, dramatically improving the performance of small transfers such as transactions and HTTP requests. Further, XTP utilizes "selective retransmission" which sends only that data known to be lost (as opposed to go-back-n retransmission), resulting in improved performance for error repair. Sitara Networks Inc. [2] devised the Sitara Networks Protocol (SNP), based upon XTP, that claims a significant speedup for client/server applications.
2.2 Connection Management
HTTP/1.1’s goal is to limit the overhead due to multiple short requests. This can be achieved with persistent connections and pipelining.
Persistent connections. This is not a new idea. Padmanabhan and Mogul [14] demonstrate that opening separate connections for each HTTP request is inefficient because TCP’s three-way handshake is invoked several times for the same HTML page (text, inlined images, etc.). A persistent connection opens only one connection per page visited, thus reducing the number of connections and reducing the load on the server. Compared to TCB, one might say that caching is being handled at a different level. However, the idea is not foolproof: consider the pathologic case from [29] wherein a huge file (e.g., a PostScript document) is coupled with a small HTML file. This configuration prevents the user from seeing anything if the huge file has to be downloaded first. HTTP/1.1 addresses this issue by allowing a file to be segmented.
Pipelining. This technique can be used only with persistent connections. In HTTP/1.0, the client downloads information in a sequential fashion and the connection is idle while the information is being processed. Padmanabhan and Mogul proposed in [14] to extend the set of HTTP requests in order to allow pipelining: they integrated the GETALL and GETLIST methods in addition to the standard GET method. The GETALL method tells the server to parse the HTML source to find all inlined images and send them at the same time; the GETLIST method avoids the GETALL actions when the client has previously cached information. It is worth noting that these methods have not been incorporated into HTTP/1.1, which does not require pipelining.
2.3 Compression
The role of compression as a technique for network congestion and bandwidth conservation has been fully exploited in non-HTML objects with tremendous success. Most HTML embedded objects, such as GIF and JPEG images, are already compressed, and therefore one might be skeptical about the benefits of compression since it would only be applied to the text portion of the document. However, the potential performance boost gained by applying even the simplest text compression algorithms has been shown to be significant by Nielson [22].
Benefits of compression. The obvious advantage of compression is that the size of HTML documents is reduced. This translates into fewer packets transmitted across the network, contributing to decreased network congestion and decreased transmission time. This is especially important for wireless digital devices for which decreased transmission time results in lower connection costs. Furthermore, in HTTP using pipelining, compression can offset the delay associated with the TCP slow start algorithm.
When a client receives the first packet of HTML data from the server, it parses it and generates requests to fetch all objects referenced within the HTML document. Because pipelining is in effect, the client’s buffer will typically not trigger a request to be sent to the server until sufficiently many requests have been accumulated. Thus, if the pipeline buffer has not filled, the batch request to the server is necessarily delayed. If the HTML sent from the server is compressed, it provides the client with the opportunity to parse more of the HTML document and fill the pipeline buffer. Compression allows more text to be included in the first HTML data packet, increasing the probability that the first batch of requests to the server for embedded objects is sent sooner. This is important because, in HTTP, the first few packets are more expensive than subsequent ones, and compression helps compensate for the lost efficiency [22].
Effects of compression. There are a number of issues concerning the implementation of compression schemes, each with its own implications for disk storage and CPU utilization for the client and server. The first issue is that the client must decompress HTML on-the-fly as it is received. Velasco [17] found that compression is best suited for high-latency, low-bandwidth connections where transmission time overshadows the client system’s speed.
The second issue is whether the server compresses the HTML dynamically or uses pre-compressed files. For the former, some fraction of the server’s computational capacity is always allocated to compression in tandem with transmission, which could negatively impact server performance, especially under heavy loads. The latter avoids this problem except in the case of dynamic HTML, which still requires on-the-fly compression. In addition, the second method requires extra server space and the overhead of managing two versions of every HTML document.
The third issue is that, for very small documents, the overhead of client-side decompression could outweigh the benefits of transmission savings. This leads to the concept of "conditional compression" in which a threshold file size is determined empirically; documents smaller than the threshold are not compressed. Velasco shows in [17] that this technique is valuable and exploits compression while eliminating the degenerate cases.
Image compression formats. The dominant compression schemes for the web are GIF, JPEG, MPEG, and JAR. GIF is an 8-bit color image format, achieves a compression ratio of 4:1, is lossless, platform independent, and best suited to small images. JPEG is targeted toward large color images and has the property that compression/decompression times, compression ratio, and lossiness are three variables that can be traded off against one another. JPEG images are resolution-dependent and scale linearly with image size. MPEG is the JPEG equivalent for moving images, using temporal as well as spatial compression. JAR (Java ARchive) is a compression format used for Java applet files (.class, sound, image) that allows a download of an entire applet and its components in a single HTTP transaction, thus avoiding multiple connections.
Future alternatives include Cascading Style Sheets (CSS) and Portable Network Graphics (PNG). CSS, supported by both Internet Explorer (v3) and Netscape (v4), is an extension to the HTML scripting language; its motivation is to eliminate the inflexible and awkward placement techniques of HTML. CSS allows the specification of color, shading, and style properties for fonts, buttons, images, bullets, etc. in a platform-independent manner, with the goal of reduced transfer time because the browser will supply these elements [31]. PNG features include 24-bit true color, variable transparency levels, gamma correction (adjusting color values so that an image appears the same on different monitors), and two-dimensional interlacing. PNG shows a 5-25% better compression ratio than GIF, while maintaining losslessness.
2.4 Caching
One of the earliest and most successful initiatives undertaken to improve network congestion and conserve bandwidth was caching. The goal of caching is to migrate frequently request objects closer to clients by a number of techniques. The most popular method today is the replication of such objects on proxy cache servers located at key Internet gateways. Caching increases the user capacity and network bandwidth available for other Internet services (e.g., telnet, gopher, ftp).
Cache requirements. Any cache server must be very efficient and have very high throughput to reduce user latency [24]. Hardware requirements include large RAM and disk storage capacity, along with fast CPU(s). On the software side, the system must support a large number of communications ports. Caching generally resides at the application layer, and thus the implementation details of the operating system, file system, network interface, etc., as well as the cache server software itself, have significant impacts on performance and efficiency.
Cache misses. When a requested object is not resident in the cache, it must be retrieved from the origin server. Rodriguez et al. [23] characterize cache misses as: (a) first access miss when the object is being requested for the first time; these account for 20% of all misses; (b) capacity miss when a previously cached object is requested but is no longer available because the cache has run out of space; (c) update miss when the requested object is in the cache but its lifetime has expired; and (d) uncacheable miss when an object is password protected, is a CGI script, or is a dynamic HTML object. The cacheability of an object can be encapsulated in the object’s MIME header, or the cache can parse the URL and determine not to cache the object if it is of a selected type (e.g., .cgi).
Cache policies. Most of the current Internet cache replacement policies have been inspired by traditional file system virtual memory caching schemes [7]. In [5] Cao and Liu compare three consistency schemes: adaptive time-to-live, polling-every-time, and invalidation. TTL techniques enforce "weak consistency" by having the browser check the origin server as needed to determine the freshness of data; polling-every-time requires that the cached copy of a client request first be validated with the server, then sent to the client; with invalidation, the server keeps track of all clients processing the data and, when the data changes, all clients are notified that their cache copies are now invalid. The studies in [5] show that the invalidation technique works best in terms of network traffic generated and server load.
Client vs. proxy caching. Client caching is the simplest form of web data caching and is used by both Internet Explorer and Netscape. The browser locally stores the content of the sites most recently visited. When a request is made, the browser must determine whether the lifetime of the cached content has expired; if it has, a new version is fetched from the origin server.
Proxy caching is widely used to eliminate "server hot spots." When a large and repeated number of requests are made for a particular web object, the resulting traffic jam degrades performance for all users (e.g., the Victoria’s Secret fashion show originated by broadcast.com). Proxy servers reduce the load on the origin server by replicating content closer to the requesting clients.
Proxy caching can be simple or cooperative. Simple caching provides a hierarchy of proxy servers. Leaf servers handle a locality; groups of leaf servers are backed by a regional proxy server; regional servers are backed by a national proxy servers; and one or more origin servers form the root of the tree. Williams shows in [32] that a hierarchy of more than three or four levels degrades performance. Cooperative proxy servers are also hierarchical, but in addition can query their siblings. When this type of server receives a request for an object not in cache, it queries not only its parent but also its peers via a simultaneous multicast. As Internet domains become increasing multicast enabled, this technique is expected to be increasing effective.
Proxy servers have three major categories: normal, semi-transparent, and transparent, each with its own set of advantages and disadvantages.
Proxy cache server. This device works as described above, with the obvious advantage of reducing response time and perceived latency for a cache hit. However, there are disadvantages. First, the browser must be configured to utilize the proxy; for a large ISP, the administrative costs can be large. Second, even if the origin server is closer, the browser is forced to take the proxy path; this risks making the proxy the new bottleneck. Third, Heddaya [11] claims that proxies lack a reliable means of detecting changes on the origin servers, causing them to serve stale data. Fourth, the proxy itself represents a single point of failure, raising questions of reliablilty [8].
Semi-transparent proxy cache server. Using this technique, the proxy server is attached to a LAN switch or router, rather than being positioned in the path between user and origin server. The advantage is that no browser configuration is required, and in fact the browser is unaware that the cache server exists. The switch or router is configured to filter incoming requests and redirect them to the proxy server. This technique shares the same disadvantage of the normal proxy server in that it needs reliable schemes to avoid serving stale data.
Transparent proxy cache server. This technique is the most versatile in terms of administration and deployment, and no browser reconfiguration is required. However, it too has problems. First, it can be slower than normal servers because it does not cache its DNS lookups. Second, browsers keep a non-persistent connection with transparent proxy servers whereas they keep a persistent connection with normal proxy servers. Third, there is no way to disable or bypass a transparent proxy server, leading to questions of reliability.
3. Evaluation
This section will provide measured performance results on:
-- Heidemann [12] points out interactions between P-HTTP and TCP that can lead to disastrous results
-- increases performance 60% for small images and 30% for large images [14]
-- ARDP shows gains of 20% for small clusters on slow networks [13]
-- results of compression studies on HTML [17]
-- results from Harvest, Squid, NetCache, DynaCache, CacheFlow
4. Conclusions
A scale is defined for the degree of improvement (i.e., reduction in latency) for the various techniques discussed. These results are presented in chart form for both low-latency and high-latency networks.
In a separate chart, each caching technique is characterized for its impact on performance, its impact on non-HTTP traffic, and its administrative overhead induced.
For caching in particular, we conclude with a list of lessons learned about cache techniques and cache placement in the network.
5. References
We provide 32 references from the current literature.