Comments, CS 851, Spring 2000

Sample Comments and Responses


Lecture of 2/8:

A performance evaluation of HTTP

I think this is a good paper: They clearly state the objective and the contribution. That is, Web server can be the primary source of document transfer latency. They showed that the performance of HTTP1.1 can become worse when the server's disk is bottleneck and this is well supported. However, it is not clear to me why HTTP1.1 does not show much performance diffrence when the CPU is bottleneck.

Kyoung-Don Kang


Potential benefits of delta encoding and data compression for HTTP

They used actual traces of the full contents of HTTP message to show the effiectiveness of the delta encoding and the compression. The basic idea is somewhat straightforward: we can save resource by doing delta encoding instead of sending the whole contents repeatedly and we can also save resource by compression. Their major contribution is the quantative analysis of the benefit. They also suggest a specific extension to the HTTP protocol for delta encoding and compression. They had a lot of tables and graphs but some of them are quite complicated to understand fully. More explanations are desirable if the space was allowed.

Kyoung-Don Kang


Lecture of 2/10:

Performance Issues in WWW Servers

This paper examines the effect of changes to the operating system and network protocol stack to WWW server performance. Modifications focused on three areas: new socket functions, per-byte optimizations, and per-connection optimizations. The socket functions degrade throughput on the server, but provide support for per-byte optimizations that reduce the amount of data copied from one subsystem to another in the OS. The combination of these two modification increases throughput substantially. Per-connection optimizations are introduced to reduce the volume of TCP packets produced, adding more modest performance gains. Overall, the combination of all these enhancements can result in server throughput gain of 25%...

The first new socket function, acceptex(), combines the functions accept(), getsocketname(), and recv(). This modification resulted in a modest performance gain (about 1 percent for small transfers). The other function, send_file(), instructs the kernel to send a file over a socket, replacing read() and write() calls. The authors show that send_file() actually introduces considerable (as much as 18% for large transfers) performance degradation. However, it provides support for fundamental changes made in the operating system that overcome this penalty.

Per-byte optimizations implemented include an in-kernel caching mechanism to eliminate copying data from the file system to the network protocal stack, avoiding unnecessary file system accesses. This could be (and has been in the case of Zeus) implemented by the application programmer by memory mapping files, but the authors argue that including it in the OS enables file sharing with other protocols. In addition, they simulated the ability of the OS to offload the calculation of the Internet checksum to an adaptor by disabling checksums (eliminating the need to copy data into the CPU). The combination of these two modification showed substantial performance gains in server throughput (21% over the baseline under the SURGE load).

Per connection optimizations were introduced to reduce the number of TCP packets that are sent during a typical HTTP session. The first modification piggybacks the server FIN packet onto the last data packet sent to the client. Another piggybacks the client's last data ACK with the FIN packet. The third removes the client ACK packet to the servers SYN-ACK and instead allows the initial client data packet (the HTTP GET request) to signal that ACK. These changes increased server throughput by as much as 5% for small files. However, it is likely that this gain will be negligible when persistent connections (HTTP 1.1) are used.

The reasons for including acceptex() in this study is not clear. The root of the performance gained by this function appears to be due to the fact that its implementation combines two kernel calls into a single kernel call. This is consistent with previous studies that focus on reducing kernel calls. Thus, its inclusion in this report is not enlightening.

The significant gains in these modification seem to lie in the caching of file data so that file accesses are minimized. The authors note that the motivation for this cache is that AIX does not possess an integrated I/O system. Some sort of comparison should have been made between their methodification to AIX and a case where an integrated I/O was used.

The visual presentation of the data in this report was lacking. Results were presented in tabular format and generally only showed incremental performance gains or a comparison between the baseline and the end result. A visual plot of filesize versus throughput for each level of modification should have been included.

The bulk of this study looked at data of workloads generated by WebStone. It would have been interesting to see the how the performance improvements were affected by varying workloads under SURGE.

In addition, the report should have discussed in more detail the forseen effects of switching from HTTP 1.0 to HTTP 1.1.

Susan Bibeault et. al.


Measuring Web performance in the Web Area

The authors propose an architecture where both the server and the network characterstics can be taken into account. They say its never been done before (though that's doubtful) and hence they came up with certains new and surprisinf results about Web-traffic. In short, the following characterstics were observed:

Under lightly loaded network: In case of a lightly loaded server the reply latency and the variance were pretty low. (as expected) However, as the server became loaded the latency and the variance both went up but they were of comparble order. This is nothing new and is as expected.

Under heavily loaded network: (characterized by high packet loss). Some surprising results were reported. Under light server, load high variance and latency was observed for small files (1K and 20K); the reason could be because of the slow-start algo in face of packet loss. Similar thing was observed for a heavily loaded server. The most surprising result was observed with 500K files. In this senario, heavily loaded server performed *better* than the lightly loaded one. This couldn't be explained in a satisfactory way by the authors and I think, its a peculiarity of their experiment.

Another important observation is the inadequacy of active measurements (particularly using Poisson based tools) for determining the performance as affected by n/w. The reason is that such tools don't take into consideration TCP's slow start and backoff strategies, hence they present a much more optimistic estimate of the n/w performance.

Avneesh Saxena et. al.


Lecture of 2/15:

LRP, IO-Lite, and Resource Containers

The authors of LRP present a brilliant idea to avoid livelock problem which can happen in the current web server. That is the server spends more time to process packets that will be eventually dropped. LRP avoids livelock by introducing several new mechanism: early demultiplexing, lazy receiver and fair priority assignment (assign the receiver priority to the network processing job). This was quite successful in the performance enhancement.

However, they found another important problem in general resource allocation: application semantics is missing in the resource allocation and sometimes OS charges an unfortunate process which is not necessary related. They present a new abstraction called "resource containers" to remedy this problem: they separate protection domain and resource principal that are combined in the conventional process notion. They introduce application semantics: an application is allocated a certain amount of total resource no matter how many threads/processes are invoked and that application is charged for this resource allocation. These two papers are pretty good in the sense that they really show new ideas and their performance gain. However, some part of their implementation and the simulation method were not very clear.

IO-Lite really focused on the multiple copy problem which impairs the server performance significantly. They presented a way to avoid this problem by introducing a kind of shared read-only file buffer called "immutable buffer". I think their original observation about performance degradation due to unnecessary copies was very impressive and the paper is well written in general. They clearly explained their implementation.

Kyoung-Don Kang

Of the 3 papers, I think the resource container is the most valuable one that make a combination of the other two papers. The lazy receiver processing is more focused on network part, while the IO Lite is more OS focused.

The value of the first 2 papers is that they provide a way to solve a problem that has been haunting web service for many years. That is, service QoS. In the current server/OS implementations, it is quite difficult for applications to provide QoS over web platforms, and as indicated by these two papers, the underlying OS doesn't offer support in network resouce and OS resource management. Even worse, as the LRP paper indicated, servers often perform very poor at time of network overload.

The container paper provides a unified way to solve this problem. Resource container is really a brand new idea for resource allocation that can be used extensively beyond server structures.

As indicated by the LRP paper, it enables application to do kind of pre-filtering to the coming packets, depending the identity of the clients. Besides possible protection again denial of service attacks, I think it has more important meaning in that it enables application to provice QoS of service according to client identities ( still limited in the IP addr. form. ) This meaning is very useful for the research currently doing on Web QoS.

The IO lite paper offers good performance gain by avoiding memory copy, as the similiar idea we discussed in the last class. A possible problem is its processing of frequently modified buffers, since all buffers are regarded as immutable.

Haibin Wang et. al.


Lazy Receiver Processing (LRP):

This paper proposes a novel network subsystem architecture based on lazy receiver processing(LRP), which provides stable overload behavior, fair resource allocation, and increased throughput under heavy load from the network. This paper presents a pretty good explanation of the problems and its design.It first explains the existing problems in current network processing. Corresponding to those problems, it then convinces people why LRP can solve the problems. I like this paper pretty much, not only because of the idea they proposed, but the way they analyze the problem. Their points why only combination of both early demutiplexing and lazy processing can solve the overload and avoid anomaly is very strong. My question here is how much the receiving data from the network will infulence the priority of the receving application process.

Jinze Liu et. al.


IO-Lite:

This paper presents another way to improve the server performance. But it focuses on reducing CPU overhead by avoiding redundant data coping in I/O subsystems and general purpose operating system. Caching and buffering is a common idea used in database and network. The difference here is different scenarios, immutabale read-only buffering. In addition, it unifies the buffer in network and file systems. I think this paper is quite strong.

Jinze Liu et. al.


Resource Containers:

The authors propose a new abstraction for operating systems called resource containers. Resource containers provide an abstraction to all of the resources used by a server to perform an independent activity. The provide a mechanism (in monolithic kernels) for control over resource management. All processing (user AND kernel level) is charged to the appropriate resource container and scheduling is done at the priority of the resource container.

I thought this was a good paper. Their idea has definite merit. However, resource containers only provide a framework for resource management. The use of resource containers can only be as effective as the underlying resource policies to track and control resource consumption. In addition, I was curious to see how performance was affected under extreme conditions (many clients operating at varying priority levels, as well as CGI-processing, and dividing the server into QoS partitions by companies). They showed that overhead was negligible, but it would be interesting to see if that changes as scheduling and accounting becomes more difficult).

Susan Bibeault et. al.


Lecture of 2/17:

Soft Timers

I think we've talked the overhead and context switching since we began to learn Operating Systems. However, this is the first time that I know a way how to reduce the overhead of context switching. Maybe in network interface, this problem is more serious then in common operating systems and it will become a bottleneck when our network speed is faster.

The good point of this idea is that it increases server's throughput in overload situation. And it is very neat to use the data of statistics. However,this approach is not suitable for real-time system. Where critical transaction's deadline must be met. Also, although this could improve server's throughput under heavy overload, I wonder whether the reponse time to the clients will be too much. And I guess it will become weird when in light server load, the reponse time is still as much as or even worse than in heavy overload situation.

A better way in my opinion is to combine active interruption and soft-timer together to make critical transaction have a better reponse time and common transaction have a guarenteed response time. Meanwhile, the server can increase throughput.

Jinze Liu et. al.

I think the main idea in this paper is easily applicable to a soft real-time application such as the Internet video streaming. It can accomodate some probabilistic limitation of the soft timer. At the same time, it can get a significant benefit from its rate-based clocking of packet transmission since it has a large number of packets to send, therefore efficient transmission mechanism is very important. The server performance loss can be reduced due to relatively less context switching and cache/TLB misse effect. As the result, the server might be able to serve more multimedia transactions in a certain time.

Kyoung-Don Kang


Scalable kernel performance for Internet servers under realistic loads

Aimed at the scalable problems in select() and ufalloc(), the paper propose new versions of them, which scale well with the number of open connections in a server process.

The main idea of the new version of select() is to preserve information about the change in the state of a socket between select_wakeup() and do_scan(). With this information, the number of the sockets that need to be checked each time decreases largely. For the ufalloc(), they converted the linear search scheme to a logarithmic-time algorithm by using a two-level tree of bitmaps.

Experiments show that these changes improve the perfomance of Web servers and proxies on realistic benchmarks and on a live proxy, without harming performance on naive benchmarks. (A good way to solve the problem: event-driven servers perform poorly under real conditions.)

Ying and Avneesh


Lecture of 2/22:

Connection Scheduling in Web Servers

This paper answers several questions about connection scheduling. The authors try and determine if there are benefits of a scheduling policy that gives preferential treatment to short connections for a web server serving static files (size of the file to be transferred is known). The authors then will attempt to answer several questions. Do current web servers now give preferential treatment to short connections? How is mean response time affected by a server that gives preferential treatment to short connections? How is performance affected? And finally, does unfairness occur with regard to large connections?

The authors set out by designing their own web server that gives them more control in influencing which connections are served first. Though control in their server is not complete, it does give then more control than that of typical Web servers. They explored two policies, size independent (FIFO) scheduling of requests, and shortest-connection first scheduling, where each device provides service only to the shortest connection at any point in time. The scheduling on their experimental server was done through 3 queues (protocol, disk, and network) and a listen thread. Each queue has an associated group of threads. The listen thread blocks on the accept() call, waiting for new connections. When a new connection arrives, it creates a connection descriptor. Its state includes two file descriptors, a memory buffer, and a progress indicator. Protocol threads gather the name of the file and the size of the file to be transferred. The disk thread reads a block of data from the file system and passes it to the network queue. Network threads then call write() on the associated socket to transfer the contents of the connections buffer to the kernels socket buffer. The disk and network threads dequeue the connection that has the least number of bytes remaining to be served (shortest-connection-first). Protocol is always FIFO (it gets file size). Their server also has the performance benefits of using a single process with a fixed number of threads (no context switching, IPC, etc). A limitation the authors have is over the order of events in the OS. To gain more control over the order, they sacrifice throughput and limit the number of threads each queue has.

Their server, implementing the two aforementioned policies, was compared in several ways to the Apache server. Their experiments were done using the SURGE workload generator and based on the idea of heavy-tailed web-task sizes (a tiny number of the very largest files make up most of the load on a web server).

Their experiments yielded the following results. They found that response time of small files is independent of file size, but that response time increases linearly as a function of file size for larger files. They showed that Apache provided worse response time for small files than their experimental server, punishing short connections. The shortest-connection-first policy improved mean response time 4-5 times compared to the size-independent policy for 1000+ UEs (even larger disparity when compared to Apache). For 1200-1600 UEs, the shortest-connection-first policy did not cause large jobs to perform worse than with the size-independent policy (1800 UEs saw shortest-connection-first policy hurt large jobs). This was attributed to the fact that large jobs do not suffer as badly in a heavy-tailed distribution versus an exponential distribution (largest 1% interrupted by less than 50% of total work arriving in heavy-tailed distribution; largest 1% interrupted by 95% of total work arriving in exponential distribution). Varying the size of the thread pool (for the queues) brought different results. The difference in performance between SCF and size-independent scheduling is the same up to 35 threads (SCF being a lot better). But the difference in performance decreases from 35-60 threads. At 60 threads, there is no buildup in the network queues (all jobs are being served very quickly, so the scheduling policy does not matter). The just-mentioned result suggests SCF could be very advantageous in a system where the degree of control over kernel scheduling is great. The just-mentioned experiment also showed that byte-throughput increased with the number of threads being used by the queues. Throughput was sacrificed to gain some control of kernel scheduling.

The authors did a great job at setting up the experiment and showing the benefits of SCF. I thought their comparisons to both Apache and to size-independent scheduling (used in most web servers today) showed how effective SCF could be. Their suggestions for architecture improvements were also valuable. However, I believe there are several glaring holes in this paper. The first is that this scheduling is concerned with web servers that deal with static files. What about dynamic content? They imply that their work is not very applicable to dealing with dynamic content, when many of the more popular web sites today are filled with dynamic content. As the authors themselves state, the SCF policy explored does not prevent the starvation of jobs in the case when the server is permanently overloaded (large jobs will never get served). Another major issue is the effectiveness of SCF does the architecture of the server have to allow a great deal of control over scheduling? Will giving the user this come at some cost? To gain this control, the number of threads in the queue pools is reduced. Is this a real benefit, since throughput will be hurt? The validity of their experimental results is somewhat questionable, too. The test used two clients. One of these clients had a port that was malfunctioning. This had to have had an effect on the results.

Tim Bellaire et. al.


Resource Management Policies for E-commerce Servers

The focus of this paper is on quality of service for electronic commerce web sites. The authors point out that previous research into resource management for e-commerce web sites has focused on improving performance in terms of conventional metrics. They suggest that these metrics are not applicable due to the fact that e-commerce sites are business oriented, and therefore measured by profit.

The concept of a Customer Behavior Model Graph (CBMG) is introduced. This graph represents the different states that a user can be in, while navigating the web site, and the transitions (with probabilities) for going from one state to another. Different classes of users are constructed (i.e. occasional buyer, heavy buyer) and CBMG's are created for each class.

One comment on the CBMG is that there is not a lot of description given to 'user think time'. The authors indicate that the think time is generated from an exponential distribution, but do not really justify this choice. If think time is exponentially distributed, then it has the property of being memoryless, but it would seem that successive think times of one user might not have this property. For example, if I want to buy a gift for someone, I might take a long time to figure out what to buy, but my next purchase might be wrapping paper, which would take much less think time to put in my cart. Therefore, the think time of the second purchase is dependent upon the first purchase.

The goal of their system is to assign priorities to incoming users based upon their usage profiles, and other factors. Every user is assigned a high priority upon visiting the web site or upon placing an item in their shopping cart. Users who have been around for a long time, but have not placed items in their shopping cart are downgraded to medium, and eventually low priorities.

The authors list m1 as a time limit to go from high to medium, and m2 from medium to low. However, they do not list any real time values for these variables. Depending on the times chosen, things like network delay (over slow WANs) may cause users with slower network connections to be assigned lower priorities, even though they may be heavy buyers.

This priority scheme allows users who will (most likely) make a purchase obtain better response times from the server. This prevents potential buyers from experiencing delays and consequently canceling a purchase. The goal is to maximize revenue and to minimize angry customers (those who leave as a result of slow access times) and lost revenue (from people who would have made a purchase, but did not because of slow access time).

The authors created a simulation of an electronic bookstore, and used SURGE to place loads on the server. The results showed the following things:

1. Revenue/sec increases for heavy buyers as load increases, and is better with priorities than without. 2. Revenue/sec increases up to a certain arrival rate for occasional buyers, and then begins to decrease. This is because under heavy loads, the occasional buyer has a lower priority than the heavy buyer. 3. For lightly loaded servers, there are no angry customers, but as the arrival rate increases, the number of angry customers increases, but these customers do not decrease the total revenue (so most likely, these are occasional buyers, not heavy buyers). 4. With priorities, there is no lost revenue. Without priorities, there is.

The idea presented in this paper seems novel and hopefully future e-commerce web servers will incorporate it. However, perhaps a more carefully designed priority scheme would be necessary. For example, if users realized that they got better performance browsing as long as they had at least one item in their cart, they may put an item in simply for the purposes of getting better performance. There, perhaps, needs to be a timeout even on customers with items in their carts. But this scheme might not even work. For the most part, users will receive correct priorities and service, but there will always be a few users who receive higher performance when they should not.

Perry Myers et. al.


Lecture of 2/24:

Providing differentiated levels of service in Web content Hosting

The authors investigated approaches to provide differentiated quality of service by assigning priorities to requests based on the requested documents. They implemented the priority-based scheduling at both user and kernel levels. They found that simple strategies such as controlling the number of processes can improve the response time of high-priority requests notably while preserving the system throughput. Also they found the kernel level approach tends to penalize low-priority requestes less significantly than the user-lever approach, while improving the performance of high-priority requests similarly.

The study is only a preliminary step in investigating differentiated QoS mechanisms in Web servers. They studied the case where there are only two levels of quality service needed. I believe there should have more priority levels in reality, the work should refine the priority levels in order to deal with different conditions. Also This work focused on a single-machine server system instead of a cluster of web servers, also their study focus on static file systems, those are some open area that should be investigate in the future work.

Jinghui Chen et. al.

In this paper, the author presents a priority based approach to provide differentiated services. They made two different changes, one is at the user-level by modifing Apache Web server program. The other is in the kernel of Linux system. This paper presents quite a few future work. For example, how to get the priority based on the URL, and how to schedule the overall resource on the server side.

Jinze Liu et. al.

In this paper, priority-based request scheduling, one way to provide differentiated quality of service, is investigated. There are two approaches to modify the Web servers in order to provide difference among requests in term of priorities. The difference of User-level approach and Kernel-level approach is analyzed. And also, their performance are measured.

Ying & Avneesh


Application Level Differentiated Services for Web Servers

The goal of this work was to implement two different levels of service for a web server: low priority and high priority. In this model, high priority requests can always preempt low priority requests.

To accomplish this the authors "slowed down" the serving of background (low priority) processes to allow more resources for "high priority" processes.

Ways to achieve background requests that are interruptable:

Note: in their work they are not changing the OS. Their results effectively controlled service. Robert's Opinion: Not fine grained enough. People are going to want low priority processes to have some intelligent scheduling algorithm. It seems like background processes in their model suffer too much.

Rob Schutt et. al.


Lecture of 2/29:

Web Server Support with Tired Services

In this paper the authors present an application-level server architecture for supporting QoS. Since overloaded servers affect all kinds of requests in the same manner, in traditional servers its difficult to guarantee better response times to preferred clients. The proposed framework consists of the following components: The authors present the results which clearly show that in face on increasing overload (upto a certain limit), the architecture succeds in providing better service to the preferred clients. However, they fail to mention that since theirs is an application-level mechanism at severe overload (when the server's listen queue fills up) and drops connections without regard to their priority, little QoS can be provided.

In my view, the most significant contribution was the introduction of session-level semantics for scheduling and prioritizing request. In typical web-transactions, its necessary to provide much higher level of service to the client whose session looks more promising to give revenue.

Avneesh et. al.

This paper addresses the QoS issues from the server's point of view. It proposes a number of necessary procedures to implement this: Classification, Admission Control and Scheduling. The aim of this QoS tiered service is to support different performance levels for different classes of users and maintain predictable performance, even at time of server overload.

As pointed out by the paper, servers are currently a significant component in end to end delay, so improvements need to be done to change this. And there are a number of issues make this necessary: Decreased Network Delay, Flash crowds and new technologies. So it is important for the server to have some kind of mechanism to guarantee services even at time of overload.

To achieve this, the paper proposes a WebWos architecture, which includes several components: Session control, measurement and application control. Session control deals with request classification, admission control, session management, request scheduling, application control deals with resource control and resource scheduling.

The prototype built in this paper is fairly flexible, it supports a number of policies: User-class & Target class based classification, two admission control trigger parameters and a number of schdeuling policies. Since the HP platform has a built in resource scheduler, it doesn't explicit on this point, though more detail on this will be very important.

The issues talked in this paper is quite practical, and its architecture is also a very standard( typical ) solution. Though, we expect it to be more explicit on the resource scheduling part.

Haibin et. al.


QoS provisioning with qContracts in Web and Multimedia Severs

This paper addresses the use of middleware to provide QoS. The advantage of using middleware is portability, while the disadvantage is its insufficient control over OS resource allocation. Using the QoS provisioning approach, performance isolation and differentiation are achieved.

Ying & Avneesh


Lecture of 3/2:

Locality-Aware Request Distribution in Cluster-based Network Servers

This paper extends the concept of load distribution on server clusters. Current approaches use a round-robin scheme to distribute the load from the front-end server to back-end servers. This distribution, however, cannot take advantage of locality of requests, and the effective cache-size of the cluster is the size of the individual back-end caches. LARD (Locality-aware request distribution) assigns requests to back-end servers so targets are sent to a consistent back end node. Thus, the effective cache size of the cluster is the sum of all back-end caches. This will improve accesses since the majority are now served from memory instead of disk. This presents a problem if a given target saturates a back-end node, so they devised the more robust LARD/R (Locality-aware request distribution with replication) that allows multiple nodes to handle a single target under overload conditions. They devised a simulation to compare several load distribution methods and showed that their method provided substantial improvements in many scenarios (throughput increase of up to 4.5 for 16 node cluster over round-robin scheme) and they also showed that it could at least match the others in loads with small working sets (i.e. cache size is not an issue).

The second major part of the paper dealt with a TCP handoff mechanism that needed to be implemented for the LARD scheme to work. The front-end must actually complete the connection with the client to determine the correct back end (since it is dependent on target). They, thus, needed to construct a mechanism for handing that connection to the back-end server that allowed the client to remain ignorant of the cluster mechanisms.

Their approach is appealing, but it would be interesting to see what happens when load reaches saturation on the front-end (since the front-end must do more than simply assign requests round-robin). Also, a major weakness of the paper is that the have not explored the necessary modifications that need to be made to LARD in order to use HTTP 1.1.

Susan Bibeault et. al.

This paper presents a new approach to improve the server performance in cluster based Network Servers by using content-based request distribution.. It is supposed to acheive both locality and load balancing among all the servers by mapping the requests to a set of target servers.

Comparing with the round robin state of the art approach,this one is definitely could get in better performance by using server locality(I think here they meant server special for a set of requests).

The following maybe some problems:

Jinze & Pinchao


Providing Differentiated Service from an Internet Server

This paper presents a new prioritized Internet server system that can provide fast response to high priority tasks while minimizing the performance penalty on low priority tasks. Their system consists of an initiator that performs admission control on the incoming requests (where you can set the percentage of different type/priority requests that are admitted). Messages that pass through the initiator are then passed to the scheduler queue where they wait to be assigned to one of the several task servers. Responses are then sent back to the clients through a communication channel. Feedback is provided to the initiator on system load and feedback is provided to the scheduler concerning network load. The initiator rejects incoming requests when the access rate exceeds the system capacity. The scheduler discards low priority multimedia tasks when there is network overload.

A simulation was performed using ClarkNet to simulate different workloads. Priority-based scheduling in their model improved the response time for high-priority traffic– however response time increased, as expected, for low priority traffic. As the ratio of high-priority traffic was increased, the slowdown and response time increased (due to self-similar traffic and the fact that high-priority tasks were not as likely to jump over lower priority tasks because fewer of them were being served). Low-priority task response time and slowdown changed little as the high-priority task ratio increased.

The authors also talked about task assignment schemes. Enhanced shortest_queue_first scheduling was a new scheduling scheme– where a new task is assigned to the server with least number of waiting tasks with equal or higher priority tasks than the new task. The authors also discussed the three parts of a task’s waiting time: delay encountered from the task sever being in serviced upon its arrival, the delay it experiences due to tasks enqueued upon its arrival, and delay due to higher priority task arriving after its arrival.

I thought this paper was very disorganized and offered nothing new. Obviously, priority-based scheduling will cause better response times for higher-priority tasks. The authors discuss task assignment schemes and why a task waits, but to me it was unclear why (they gave theories without statistical backup). Another issue was why these used ClarkNet instead of Surge to simulate workload.

Timothy Bellaire et. al.

In this paper, a web server model is presented. In this model QoS can be assured by maintaining the system load with threshholds, which can be achieved by admission control, scheduling, and effecient task assignment schemes.

A new concept is used to test the system's performance. Slowdown --- the radio of its response time to its service time. It is reasonable, because a user is often willing to wait longer for a big task.

The system's performance is tested. With priority-based sheduling, it is shown that the high priority requests incur low delay even when the system approaches full utiliztaion. The relationship between the increase in high priority ratio and the mean slowdown/ mean response time curve of the high priority tasks is shown.

At last, the task assignment schemes are compared. Enhanced_SQF is shown to be helful under nomal load, compared to SQF.

Ying and Avneesh


Lecture of 3/7:

Load Distribution among Replicated Web Servers: a QoS-based Approach

The authors propose a new architecture for using replicated Web servers to distribute client load. Their model is contrasted with the current "mirror" model (the user is given a list of mirror sites to manually choose), and the "DNS-based" model (the DNS server is used to apply a round-robin assignment of clients to replicated servers). In their proposed QoS approach, the client's browser determines which server best fits its QoS needs.

Their implementation required modification to the DNS service. The client browser queries the DNS server for all replicated servers and then probes (via broadcast UDP) the replicated servers and chooses a server that it measures to have the best response time. The authors demonstrate that this technique more fairly distributes the load between the servers and produces better user response times versus the mirror and DNS-base methods. Since modifying DNS is not an immediate option, they outline an altered QoS based implementation where all replicated servers keep track of their peer servers. A client gets a list of all replicated servers when it accesses (polls) the initial server. If the client determines the responsiveness of the polled server is not adequate, it will poll the replicas (again using broadcast UDP) to determine the best response time. This proposed implementation introduces the determination of adequate response time by a single measurement (the initial poll) which the authors claim is a bad idea since the measurement can be skewed by load distribution. Why not then enforce the UDP broadcast? I don't think the authors explained the single measurement point well enough. A more serious problem is the additional server load introduced by the initial (HTTP) poll of the server that returns the replica addresses. This polling can potentially introduce so many additional server requests that it makes the method undesirable under certain load distributions. This issue definitely needs to be assessed.

Susan Bibeault et. al.

This paper is valuable in that it offers something different from the popular implementation of distributed web-hosting: Round-robin & Locality reference. As shown from its different test scenarios, it excels over the later 2 algorithms in response time and availability.

The later 2 load distribution algorithms doesn't take actual load at different servers into consideration, so it is possible that some servers become overloaded, while others remain underloaded. The QoS based algorithm considers this and make each browser decide its own load distribution. The paper proposes two implementations for this strategy, the first requires changes to DNS, the later no change. But both requires changes at the browser side and server side, which could be a major impede for actual implementation.

Another thing is the actual load real time propoagation across the network. This could be a real difficult problem, and the paper doesn't give a persuasive answer to this, though possible research directions are pointed out. So a simple load distribution algorithm is finally tracked back to a classical network propagation problem, which remains no optimal solution. Though, the paper does point out a good direction.

Haibin et. al.


A Scheduling Framework for Web Server Clusters with Intensive Dynamic Content Processing

The authors of the paper propose a scheduling framework with a master/slave architecture for clustering Web servers to lessen the bottleneck at the server due to the dynamic content requests. The master level accepts and processes both dynamic and static content requests, while the slave level is only used to process dynamic contents upon masters requests. Two web server-clustering solutions today are DNS Based Round-Robin Clusters and Switched-Based Clusters. Though both are used, neither can deliver satisfactory performance for Web sites with intensive dynamic content processing. A Master/Slave architecture that uses some data replication is a lot more effective. The setup would be as follows (as discussed in the paper): master nodes sit at level 1 they can either be linked to a load balancing switch with a single IP address or they can be attached to hot-standby nodes for fault tolerance with requests distributed by DNS. Static requests are processed locally at the master nodes. CGI requests are processed at master nodes or redirected to another master or slave. Slaves only handle CGI requests.

M/S is feasible. Fewer nodes to process static content do not cause any problems. In studies it has been shown that most web servers have sufficient throughput to deliver static content at a rate greater than what the outgoing link can handle. M/S is also better for several reasons. It offers better expandability (recruitment of non-dedicated nodes) and better efficiency (M/S separates dynamic and static content processing so long-running CGI scripts don't slow down static content processing). M/S also offers better availability (fault tolerance is easily implemented by masking the failures).

Because few nodes were available for experimentation, a lot of simulation was performed to analyze the M/S architecture. M/S was compared with two other architectural solutions Flat-R (user requests are evenly distributed to nodes) and Flat-C (requests are scheduled to a node with the least number of outstanding connections). M/S was shown in their experiments to improve Flat-C by 23 % and 36 % over Flat-R. It was also shown that as the average ratio of CGI processing rate to static request rate lessened, CGGI activity becomes more intensive and the M/S architecture was better suited to handle this. This is because as the average processing time of CGI requests increases, optimizing CGI performance becomes more critical. There will be more CGI requests, too, at each node, which increases the waiting time of static requests. M/S architecture also benefited greatly from the ability to recruit non-dedicated resources. M/S was also compared with M/S-ns (no sampling is used to assess I/O and CPU demands), M/S-nr (no reservation is used to keep a portion of master resources available for static content processing), and M/S-1 (all nodes treated as master nodes). M/S significantly improved M/S-nr, while M/S-1 had significant performance degradation. The performance improvement of M/S over M/S-ns averaged 14 %. Performance sensitivity was studied to figure the optimal number of master nodes for the system. It was found to be 6 for 32 and 25 for a 128-node system.

Tim Bellaire et. al.

This paper addresses the processing difference of dynamic and static content at web server clusters, and propose a server cluster architecture to differentiate dynamic/static contents' processing at different machines.

A phenomenon is observed by the author that static content processing and dynamic processing are different in processing times, and treating them in the same way often prolong the static content's response time unfairly. So the paper proposed to divide them and process the request in different ways according to whether they are dynamic or static.

The cluster's architecture is composed by master nodes and slave nodes. Master nodes take care of both dynamic/static content processing, also forward part of the dynamic content processing to slave nodes. Slave nodes are supposed to only process dynamic content.

By chancing the number of nodes in a cluster, the ration of master nodes in a cluster and the ration of dynamic load shifted to slave nodes, the paper tries to get a better stretch factor for request processing. The stretch factor is defined as the ratio of response times of a sequence of request over the service demands of these requests. Literally, it means the ratio of waiting time to processing time. It is a more resonable performance indicator since it also considers the server's load.

The whole paper rests on its analysis of a complex math inequation. A number of assumptions are made, though not all of them is valid, such as the assumption that request comes with a pisson distribution. Though, it offers a good way for consideration.

Another things new in the paper is its invention of RCGI techniques, though they said that RCGI offers better performance than CGI at a busy server, I can't understand why they simply use Fast-CGI since it eliminates the overload of remote process fork and apparently is better for performance.

The paper is valuable in its addressing of the difference in processing dynamic contents.

Haibin et. al.