File IO: Initial Modifications
Michael Walker and Brian White
September 4, 2000
Brian and I set up three test nets, to measure the IO read performance under three different circumstances:
We found that the numbers for the first two cases were nearly identical, minus a negligible difference of 0.0027 MB/sec. This suggests that our IO performance is indeed IO bound.
The IO modifications yielded a consistent speedup of 21.1 milliseconds over the average unmodified run. This is a 4.7% improvement over the original execution time, which is slightly lower than expected, but around the original predicted range.
Optimizations
You optimized away 1 byte copy and 1 block copy NOT 2 byte copies. Here are the details:
Original Critical Path:
BasicFileObject::read does 1 block copy
(creates a BasicFileTransferBlock and copies from the LegionBuffer container of
the file)
BasicFileObject.trans.c:Legion_read does 1 block copy
(packs BasicFileTransferBlock into LegionBuffer)
Legion_libBasicFiles.c: 1 block copy, 1 byte copy
(creates a BasicFileTransferBlock from LegionBuffer – using LegionBuffer
unpack. NB: You’ve saved byte copies in BasicFileTransferBlock.c in the copy
constructor and assignment, neither of which are relevant in the LegionBuffer
constructor.
1 byte copy into LegionBuffer)
Modified Critical Path:
BasicFileObject::read does 1 block copy
BasicFileObject.trans.c: does 0 block copies
Legion_libBasicFiles.c: does 1 block copy into user buffer
Total savings: 1 block copy, 1 byte copy
= 1 * (~200 MHz * 1 MB) + 1 * (~33 MHz * 1MB)
= 5 ms + 30 ms = 35 ms
Empirically, we see a savings of ~20 ms.
We were NOT able to verify that c254 and c255 were on the same switch. The machine that would appear to be c255 (and on the same switch) was neither labeled nor responsive on the terminal (?).
All tests yielded a high degree of accuracy; the widest variance of any of our tests was just 2.4% (11 ms) between the highest and lowest measurements. All tests were on a "closed" net on c254/255, and all transfers were 1 MB. Graphs of the collected data follows.
