|
We modify the code of MulTreeLDP on PC2 to measure
and
. We add a thread to MulTreeLDP that automatically brings down and brings up eth3. MulTreeLDP records in a timestamp the instant at which it brings eth3 up. MulTreeLDP computes the difference between the time at which it detects the link failure and the instant at which eth3 is brought down. This time difference is
. Then, the thread we added to MulTreeLDP chooses a random time value using the internal random number generator of the PC and sleeps during that time. When the new thread wakes up, it brings up eth3 and records in a timestamp the instant at which it brings eth3 up. When the link repair is detected, PC2 computes the time difference between the instant of the repair detection and the aforementioned timestamp. This difference is
. The new thread successively brings down and brings up eth3 100 times and then quits. We record the 100 values for
and
, stop the MulTreeLDP program and restart it on all four machines used in this experiment. We collect 25 series of 100 values for both
and
. Therefore, we collect 2500 values for
and 2500 values for
. According to our model presented in Section 4.2, the time to detect a link failure depends on two factors. The first factor is the length of the time interval between the instant at which PC3 sends the last probe before the failure occurs and the instant at which the failure occurs. We called this time
in Section 4.2. We randomize
by bringing up and bringing down interface eth3 at random times. The second factor is related to the synchronization between the timers on PC2 and PC3. In Section 4.2, we called
the difference of synchronization of the timers of PC2 and PC3. We assume that manually stopping and restarting MulTreeLDP on all machines randomizes
.
In Figure 6.5, we show the distribution of the 2500 samples of
for 2 ms long time intervals, and compare this experimental distribution with the expected distribution derived from the analytical model in Section 4.2. The average for the 2500 samples of
is
=25.4 ms. With
and
=10 ms the theoretical average is
= 25 ms. Although the model we discuss in Section 4.2 is simple, our experimental results match the theoretical values determined with the model.
In Figure 6.6, we show the distribution of the 2500 samples of
for 2 ms long time intervals and compare this distribution with the expected distribution derived from our model. Here, the experimental results do not match the model well. We expect 10 % of the recovery detection times to be comprised between 0 and 10 ms and 0 % above 10 ms, but only 6.1 % of the samples are comprised between 0 and 10 ms and more than 3 % of the values are higher than 10 ms. Actually, the experimental recovery detection times are not comprised between 0 and 10 ms but between 0.4 and 10.4 ms, as shown in Figure 6.7. The average for the 2500 samples of
is
=5.48 ms, which is close to the theoretical average (5 ms).
|
|
|
We conduct additional experiments to assess the behavior of the link failure detection mechanism for high link capacity utilization. We modify the setup of the experiment such that PC1 sends traffic to PC5 using the multicast LSP. The traffic consists of UDP packets of 8192 bytes. When we set the sending rates at 93 Mbits/s or more, we observe that PC2 and PC3 make false detections, i.e they detect that the link between PC2 and PC6 successively fails and is repaired several times per second. The PC routers are not fast enough to forward the packets and send or check the reception of the probes at the same time. As discussed earlier, Linux is not a real-time operating system therefore there is no guarantee that probes are sent exactly every
ms or that probe reception is checked exactly every
ms under high load of the system. Solutions to this issue include increasing
or
(at the cost of higher link failure and detection times), using a real-time operating system, using faster routers, or using a fraction of the maximum throughput achievable with MPLS multicast to send traffic. In the remaining experiments, we send traffic at lower rates to avoid false detections.