next up previous contents
Next: Conclusion Up: Experiments Previous: Measuring link failure and   Contents

Subsections


Experiment 4: Measuring the tree repair time


In this experiment, we measure the time $T_{repair}$ to repair a tree on a link failure and study the behavior of the MPLS multicast Fast Reroute mechanism when a failed link is physically repaired. For each experiment, we present typical graphs to illustrate our discussion.

Experiment 4.1: Measuring the service interruption time due to a link failure

Figure 6.10: Switchover and switchback in the multicast routing tree set up on our testbed. PC1 sends traffic over the tree. When the link between PC2 and PC3 fails, PC4 reroutes traffic over the backup path between PC4 and PC3. When the link is repaired, PC4 stops forwarding traffic on the backup path.
\includegraphics[width=\textwidth]{figures/exp_switchover_setup}
In this experiment, we determine the distribution of the service interruption time due to a link failure. The interruption time is the time to repair the tree including propagation delays (see Section 2.1). We keep the setup from Section 6.4 and do not use Monitor (see Figure 6.10). We set up a multicast LSP of six nodes. PC1, PC5 and PC6 are the LERs of the tree. PC2, PC3 and PC4 are LSRs. The link between PC3 and PC4 is the unique link of the backup path. The PSLs are PC3 and PC4. The links between PC3 and PC2, and PC2 and PC4 are the links of the protected path. PC1 is a source and sends UDP packets of 8192 bytes at 40 Mbits/s on the tree. The receivers are PC5 and PC6. We simulate the failure and repair of the link between PC2 and PC3 by bringing down and bringing up interface eth3 of PC2. To bring up and bring down interface eth3 of PC2, we use the additional thread in MulTreeLDP we introduced in Section 6.3. This thread brings down and brings up eth3 at instants randomly chosen by the random number generator of the machine. After the interface is brought down and brought up 100 times, we stop and restart MulTreeLDP manually on all six machines. We repeat the experiment 25 times to collect 2500 values of the repair time.

When we simulate the link failure, PC5 stops receiving traffic. PC2 detects the link failure and notifies PC4 of the failure. When PC4 is notified of the failure, it switches traffic over the backup path and PC5 resumes receiving the traffic sent by PC1 (see Figure 6.10). The repair time is the time during which PC1 receives no packet. We measure the repair time as follows. On PC5, we record the arrival time of each packet sent by PC1. PC1 sends one packet roughly every 1.5 ms therefore the packet interarrival time at PC5 is 1.5 ms before a link failure. We compute the interarrival time for any two packets successively received at PC5. According to Section 6.3, the minimum amount of time to detect a link failure is 10 ms, which is much larger than the packet interarrival time before link failure. Therefore, we consider that every interarrival time longer than 10 ms is a service interruption time due to a link failure. Since we are not able to distinguish whether an interarrival time is longer than 10 ms due to a link failure or an external phenomenon not related to our experiments, we collect more than 2500 values for the repair time. Actually we collect 2600 values; we present in Figure 6.11 the distribution for all collected samples.

Figure 6.11: Experimental distribution of the repair time.
\includegraphics[width=\textwidth]{figures/exp_switchover_distrib}

The average for all samples is $\overline{T}_{repair}$=29.4 ms, with a minimum of 10 ms (by construction of the sample set) and a maximum of 49.6 ms. The standard deviation is 7.1 ms. The average $\overline{T}_{repair}$ is close to the sum $\overline{T}_{fdetect}+1\times\overline{T}_{nnotif}$=25.4+1.2=26.6 ms. Thus, there is a difference of less than 3 ms between the experimental and the analytical averages for the repair time. This difference takes into account the propagation delays and the time for the PCs to modify the MPLS tables. The experimental results show that a network can be repaired in average in less than 50 ms by MPLS multicast Fast Reroute. The main component of the repair time is the detection time. In larger trees, only the notification time increases. Our data shows that our implementation of MPLS multicast Fast Reroute can repair multicast routing trees with a protected path of up to 20 links in less than 50 ms. With a protected path of 20 links, the average repair time is $25.4+20\times1.2=49.4$ ms.

Experiment 4.2: Observing duplicate packets on the tree when a failed link is repaired

We perform additional experiments to show the duplicate packets that flow on the tree on switchback. We do not change the experiment setup. PC1 starts sending UDP packets of 8192 bytes at 85 Mbits/s on the tree at time $t=0$. We simulate the failure of the link between PC2 and PC3 at time $t \approx 2.3$ s by bringing down interface eth3 of PC2. We simulate the repair of the link between PC2 and PC3 at time $t \approx 4.6$ s by bringing up interface eth3 of PC2 and we record the reception time for each packet on PC5 and PC6.

Figures 6.12 and 6.13 depict the evolution of the amount of data received by PC5 and PC6 when PC1 sends data over the multicast routing tree. Figures 6.14, 6.15, 6.16 and 6.17 are enlarged views of Figures 6.12 and 6.13. We observe that PC5 receives no traffic for 29 ms starting at time $t=2.321$ s (Figure 6.14). PC6 receives traffic during the full duration of the experiment.

The link failure occurs at time $t \approx 2.321$. Before PC 3 and PC4 have performed switchover, PC 5 receives no traffic. At time $t=2.350$ s, switchover is complete and PC5 receives the traffic from PC1 via the backup path. The repair time is 29 ms. Since the failure is not located between PC1 and PC 6, PC6 keeps receiving traffic during the link failure. We expect to observe a slope increase on Figures 6.16 and 6.17 due to packet duplication at switchback time (see Section 4.4), however the slope of the curves slightly decreases during a short time when the link is repaired. The reason why the traffic increase phenomenon is not visible in this experiment lies in the size of the UDP packets we used. Indeed, PC1 sends packets of 8192 bytes. Ethernet can send frames with a payload of at most 1500 bytes over the links and therefore on PC1 the IP layer has to segment the UDP packets before passing them to the Ethernet layer. On PC5 and PC6 the IP layer reassembles the fragments of the packets received in the Ethernet frames. During the reassembly process the IP layer detects the duplicate frames and discards them. Therefore, although there is a traffic increase on two links of the network, this increase is hidden by the IP layer of the receivers.

To make the duplicate packets on switchback apparent, we conduct an additional experiment. The experiment setup is kept unchanged except that PC1 sends UDP packets of 1024 bytes at 40 Mbits/s. We simulate the failure of the link at $t \approx 2.3$ s and the repair at time $t \approx 4.50$ s. We show the traffic received by PC5 and PC6 during the total length of the experiment in Figures 6.18 and 6.19. Figures 6.20 and 6.21 are enlarged views of Figures 6.18 and 6.19.

In Figures 6.20 and 6.21, the slope change due to switchback is visible at time $t \approx 4.50$ s. After the link is repaired and before switchback is complete, PC5 and PC6 receive duplicate packets, hence the slope change during approximately 5 ms on each receiver. The slope change lasts only a few milliseconds, making it impossible to distinguish from other curve irregularities via automated means and preventing us from measuring $T_{repairback}$ experimentally.

Figure 6.12: Traffic received by PC5 when the tree sustains a failure and a recovery (UDP packets of 8192 bytes). The failure occurs at time $t \approx 2.3$ s and the link is repaired at $t \approx 4.6$ s.
\includegraphics[width=\textwidth]{figures/exp_overall_PC5-1}

Figure 6.13: Traffic received by PC6 when the tree sustains a failure and a recovery (UDP packets of 8192 bytes). The failure occurs at time $t \approx 2.3$ s and the link is repaired at $t \approx 4.6$ s.
\includegraphics[width=\textwidth]{figures/exp_overall_PC6-1}

Figure 6.14: Switchover on PC5 (packets of 8192 bytes). PC5 receives no traffic between $t=2.321$ s and $t=2.350$ s. The interruption of service seen by PC5 is 29 ms.
\includegraphics[width=\textwidth]{figures/exp_switchover_PC5-1}

Figure 6.15: Switchover on PC6 (packets of 8192 bytes). Instead of increasing, the slope of the curves slightly decreases on the link failure.
\includegraphics[width=\textwidth]{figures/exp_switchover_PC6-1}

Figure 6.16: Switchback on PC5 (packets of 8192 bytes). Instead of increasing, the slope of the curves slightly decreases on the link failure.
\includegraphics[width=\textwidth]{figures/exp_switchback_PC5-1}

Figure 6.17: Switchback on PC6 (packets of 8192 bytes). PC6 is not affected by the link recovery.
\includegraphics[width=\textwidth]{figures/exp_switchback_PC6-1}

Figure 6.18: Traffic received by PC5 when the tree sustains a failure and a recovery (UDP packets of 1024 bytes). The failure occurs at time $t \approx 2.3$ s and the link is repaired at $t \approx 4.50$ s.
\includegraphics[width=\textwidth]{figures/exp_overall_PC5-3}

Figure 6.19: Traffic received by PC6 when the tree sustains a failure and a recovery (UDP packets of 1024 bytes). The failure occurs at time $t \approx 2.3$ s and the link is repaired at $t \approx 4.50$ s.
\includegraphics[width=\textwidth]{figures/exp_overall_PC6-3}

Figure 6.20: Switchback on PC5 (packets of 1024 bytes). A traffic increase during a short period is visible at time $t \approx 4.50$ s.
\includegraphics[width=\textwidth]{figures/exp_switchback_PC5-3}

Figure 6.21: Switchback on PC6 (packets of 1024 bytes). A traffic increase during a short period is visible at time $t \approx 4.50$ s.
\includegraphics[width=\textwidth]{figures/exp_switchback_PC6-3}

next up previous contents
Next: Conclusion Up: Experiments Previous: Measuring link failure and   Contents
Yvan Pointurier 2002-08-11