This memcpy() code is optimized for all AMD Athlon and Duron family processors. This includes Athlon XP, Athlon MP, Athlon 4 (mobile), and Duron. The code uses MMX and prefetchnta instructions, and employs "non temporal" memory writes on large blocks, which bypass the cache for better efficiency. For large blocks, it uses the Block Prefetch technique. This code typically provides significantly improved performance. Performance gains are dependent on particular system specs, including CPU speed, CPU type, chip set, main memory type, and main memory speed. The data block size and alignment are also factors. Developers should test their applications to determine their exact performance benefit. The application code should make sure that it's running on an AMD Athlon or Duron or other appropriate processor before executing this optimized memcpy(). MMX, PREFETCHNTA and MOVNTQ must be supported by the CPU. The standard library memcpy() should be called when running on other processors. The optimized memcpy is called memcpy_amd() to avoid naming conflicts with the standard memcpy(). There are 2 versions of the code included: memcpy_amd.cpp This code is written using inline assembly language, for Microsoft Visual C++ 6.0 with the Processor Pack, or later edition (such as Visual Studio.NET) which supports the instruction set extensions. memcpy_amd.asm This is a pure assembly language version, using MASM syntax.