Memory Bandwidth Experiments
Michael Walker and Brian White
August 29, 2000
Brian and I tested the memory bandwidth for two different memory copying schemes: memcpy and the character-by-character copy scheme used in Adamís Legion code. We ran the tests on Intels, Alphas, and Solaris, and generally found that the memcpy approach outperformed the char-by-char copy approach five-fold. This was especially evident as the size of the copied region increased.
The char-by-char method of copying maintained a constant bandwidth, regardless of the data transfer size. The bandwidth averaged 30.35 MB/sec for Intels, 14.83 MB/sec for Alphas, and 9.9 MB/sec for Solaris.
The use of memcpy had a varying bandwidth, but almost always outperformed the char-by-char method. The bandwidth for 1 MB copies was 198.7 MB/sec on intel, 279.5 MB/sec on alpha, and 121.9 MB/sec on solaris. We believe the intels have a (64-bit wide) 66 MHz memory bus (implies theoretical peak: 8 * 66 => 528 MB/s).
For cases where the copy size was small, we believe that the CPU caches greatly increased the effective bandwidth for memcpy. Notice that the char-by-char approach does not benefit from the L1 and L2 caches. But even after we raised the size to something that was clearly larger than the available cache (i.e. 1 MB), we still found that memory bandwidth was far greater using memcpy.
We concluded that memcpy would clearly be a better choice for the data copies. The numbers for each architecture follow.