File IO: Initial Modifications


Michael Walker and Brian White

September 4, 2000

 

Brian and I set up three test nets, to measure the IO read performance under three different circumstances:

    1. Legion 1.6.6 unmodified, with optimization off;
    2. Legion unmodified with O3 optimization turned on;
    3. Legion with Andrew’s IO modifications and O3 optimization on.

We found that the numbers for the first two cases were nearly identical, minus a negligible difference of 0.0027 MB/sec. This suggests that our IO performance is indeed IO bound.

The IO modifications yielded a consistent speedup of 21.1 milliseconds over the average unmodified run. This is a 4.7% improvement over the original execution time, which is slightly lower than expected, but around the original predicted range.

Optimizations

You optimized away 1 byte copy and 1 block copy NOT 2 byte copies. Here are the details:

Original Critical Path:

BasicFileObject::read does 1 block copy

(creates a BasicFileTransferBlock and copies from the LegionBuffer container of

the file)

BasicFileObject.trans.c:Legion_read does 1 block copy

(packs BasicFileTransferBlock into LegionBuffer)

Legion_libBasicFiles.c: 1 block copy, 1 byte copy

(creates a BasicFileTransferBlock from LegionBuffer – using LegionBuffer

unpack. NB: You’ve saved byte copies in BasicFileTransferBlock.c in the copy

constructor and assignment, neither of which are relevant in the LegionBuffer

constructor.

1 byte copy into LegionBuffer)

Modified Critical Path:

BasicFileObject::read does 1 block copy

BasicFileObject.trans.c: does 0 block copies

Legion_libBasicFiles.c: does 1 block copy into user buffer

Total savings: 1 block copy, 1 byte copy

= 1 * (~200 MHz * 1 MB) + 1 * (~33 MHz * 1MB)

= 5 ms + 30 ms = 35 ms

Empirically, we see a savings of ~20 ms.

We were NOT able to verify that c254 and c255 were on the same switch. The machine that would appear to be c255 (and on the same switch) was neither labeled nor responsive on the terminal (?).

All tests yielded a high degree of accuracy; the widest variance of any of our tests was just 2.4% (11 ms) between the highest and lowest measurements. All tests were on a "closed" net on c254/255, and all transfers were 1 MB. Graphs of the collected data follows.