CUDA Support/Choosing a GPU

From CS Support Wiki
Jump to: navigation, search


Running your code on a machine with multiple GPUs may result in your code executing on an older and slower GPU.


If you know the device number of the GPU you want to use, call cudaSetDevice(N). For a more robust solutions, include the code shown below at the beginning of your program to automatically select the best GPU on any machine.


When running a CUDA program on a machine with multiple GPUs, by default CUDA kernels will execute on whichever GPU is installed in the primary graphics card slot. On some machines, the default GPU may be significantly slower and potentially not provide advanced features like atomic instructions or double-precision floating point. If you know that you want to execute on the Nth GPU, calling cudaSetDevice(N) will make your code execute on the correct GPU. However, this is not a robust solution because the cards may be switched in the future and the Nth card on a different machine might not be the card you want to use.

For a more robust solution, include the following code somewhere at the beginning of your program:

int num_devices, device;
if (num_devices > 1) {
      int max_multiprocessors = 0, max_device = 0;
      for (device = 0; device < num_devices; device++) {
              cudaDeviceProp properties;
              cudaGetDeviceProperties(&properties, device);
              if (max_multiprocessors < properties.multiProcessorCount) {
                      max_multiprocessors = properties.multiProcessorCount;
                      max_device = device;

This code automatically selects the card with the largest number of multiprocessors, which should pick the "best" card in most cases. A few notes about this code:

  1. The template SDK project which many people start working with includes a call to CUDA_DEVICE_INIT. This function (actually a macro) will automatically set the device back to the default device. The most obvious solution is to simply remove the CUDA_DEVICE_INIT call. Alternatively, adding a call to cudaFree(0) after the code above but before the CUDA_DEVICE_INIT will also solve the problem. A third approach is to move the code above to after the CUDA_DEVICE_INIT call. The reason this works is because it is the last call to cudaSetDevice before the first interaction with the GPU (for example, calling cudaFree) that decides which device your code will execute on. For this reason, you must ensure that the code above is added before any CUDA-related function calls, such as cudaMalloc or cudaMemCpy.
  2. Compiling this code with GCC has been problematic; there probably is a workaround, but the simplest approach is to include this code only in a .cu file, not a .c file (this probably will not apply to most CUDA developers).

Back to CUDA Support