Difference between revisions of "CUDA Support"

From CS Support Wiki
Jump to: navigation, search
m (Choosing among multiple GPUs)
(Installing the CUDA Toolkit)
 
Line 80: Line 80:
 
===== Installing the CUDA Toolkit =====
 
===== Installing the CUDA Toolkit =====
  
Find within the /cuda directory the toolkit installer that matches the architecture of your host, and invoke it with sudo:
+
Within the /cuda directory, find the toolkit installer that matches the architecture of your host, and invoke it with sudo.  Versions 2.0 and earlier start with "NVIDIA_CUDA_Toolkit"; later versions start with "cudatoolkit".
  
 
i386 example:
 
i386 example:
  sudo /cuda/NVIDIA_CUDA_Toolkit_2.0_ubuntu7.10_x86.run
+
  sudo /cuda/cudatoolkit_2.1_linux32_ubuntu8.04.run
 
x86_64 example:
 
x86_64 example:
  sudo /cuda/NVIDIA_CUDA_Toolkit_2.0beta2_Ubuntu7.10_x86_64.run
+
  sudo /cuda/cudatoolkit_2.1_linux64_ubuntu8.04.run
  
 
Accept the defaults.  At the end of the installation, you'll see the following notices:
 
Accept the defaults.  At the end of the installation, you'll see the following notices:

Latest revision as of 15:38, 15 April 2009

Getting Started

CUDA support includes three main components:

  • Nvidia driver with CUDA support
  • CUDA Toolkit
  • CUDA SDK with code samples

We now provide support for these tools for lab, research group and grad desktop machines, under both Windows and Linux. We administer CUDA differently depending on the machine - please find the case that matches yours below. If you have CUDA support questions that aren't addressed here, please email root.

002a Lab

Windows

You should find CUDA installed under C:\CUDA. If that directory doesn't exist, or it doesn't contain a subdirectory called NVIDIA_CUDA_SDK, check under C:\Program Files\CUDA and copy the appropriate files/directories to C:\. This is needed because you are not allowed to modify any files in Program Files.

Once you've passed that step, open up C:\CUDA\NVIDIA_CUDA_SDK\projects in Windows Explorer. Go into any of the project directories, and double-click the Visual Studio solution file, which should open up that project in Visual Studio. Try to compile the project. The output at the bottom of the screen will tell you whether the compile was successful or not.

Any of the SDK example projects should compile out of the box. If not, contact your TA; the machine is either not setup correctly, or you did something wrong along the way.

If you get a successful compile, try to run your binary in debug mode (which is the default for Visual Studio). For most projects, a window will pop up with some output (hopefully telling you that everything is fine) and prompting you to hit ENTER to exit. Do so.

Now you can start your own project. Close the example project that you've been working on. Create another directory with a new name under projects. Copy all the files in the project directory you have been working in up to now to that new directory.

Open up the project in the new directory. You should probably rename the project inside Visual Studio to the same name you gave to the directory to avoid future confusion.

Now you can start modifying the code, and see if anything still works after you have done so. :-)

Linux

The Nvidia driver and CUDA toolkit are already installed; the toolkit is available at /usr/local/cuda. Follow the instructions for installing a 32-bit copy of the SDK in your home directory at #Installing the CUDA SDK and code samples - the same procedure applies to the 002a machines.

Grad desktops

To run CUDA programs on the GPU, you will need a CUDA-capable Nvidia graphics card - a G8X series or greater. These are still relatively rare within the department, so if you're not sure whether your machine is CUDA-ready, it probably isn't. When in doubt, feel free to send mail to root, and we'll take a look at your card and let you know whether it'll work.

Regardless of whether your card supports CUDA, you may install the CUDA Toolkit and SDK on your machine and do development work there. You can even run your programs on the emulator that comes with the CUDA Toolkit, but the emulator is incredibly slow by comparison, so we don't recommend it for anything but lightweight testing.

Windows

If you are an administrator on your machine, you can download and install the driver (if applicable; see note above about supported cards), Toolkit and SDK.

Linux

If you have a card that supports CUDA, send email to root with your hostname, and we'll give you sudo access for installing and configuring the NVIDIA driver and CUDA Toolkit. Otherwise, you can download the Toolkit and SDK and install them into /usr/local or your home directory.

Professor Skadron's research machines

Installation Instructions

To set up CUDA tools on a departmental Linux machine, first send mail to root@cs.virginia.edu requesting that the host be added to the CUDA Cfengine group and the main departmental /etc/sudoers file. If you are new to the group, you will also need to request to be added to the CUDA_USERS group of /etc/sudoers.

Once those changes are made on the server, the host will pick up CUDA-related configuration within an hour of being up on the network and booted into Linux. A /cuda directory will be automatically created on the host, which contains the installers for the driver, toolkit and SDK (Note: due to version changes, your /cuda directory may not be identical to this listing).

root@skadrondell3:~# ls /cuda
NVIDIA_CUDA_sdk_2.0beta2_linux.run
NVIDIA_CUDA_Toolkit_2.0beta2_Ubuntu7.10_x86_64.run
NVIDIA_CUDA_Toolkit_2.0_ubuntu7.10_x86.run
NVIDIA-Linux-x86-177.67-pkg1.run
NVIDIA-Linux-x86_64-177.13-pkg2.run
NVIDIA-Linux-x86_64-177.67-pkg2.run

Members of the CUDA_USERS sudo group have the permission to run everything in /cuda, along with several system utilities required for the driver installation.

Installing the Nvidia CUDA driver

First, shut down the X server; the Nvidia driver installer requires that X not be running.

sudo killall gdm

Next, find within the /cuda directory the driver installer that matches the architecture of your host - they start with "NVIDIA-Linux". Run "uname -a" if you're unsure of the architecture; x86 machines will list "i686" and x86_64 machines will list "x86_64". As of September 2008, the current version of the Nvidia driver is 177.67. Via sudo, run the installer that matches the host architecture.

i386 example:

sudo /cuda/NVIDIA-Linux-x86-177.67-pkg1.run  

x86_64 example:

sudo /cuda/NVIDIA-Linux-x86_64-177.67-pkg1.run

Select the defaults, and answer yes to the question "Install NVIDIA's 32-bit compatibility OpenGL libraries?" At the end of the installer, you will be asked whether to overwrite your X server settings file (xorg.conf); do so.

Once the driver installer completes, restart the X server:

sudo /etc/init.d/gdm start
Installing the CUDA Toolkit

Within the /cuda directory, find the toolkit installer that matches the architecture of your host, and invoke it with sudo. Versions 2.0 and earlier start with "NVIDIA_CUDA_Toolkit"; later versions start with "cudatoolkit".

i386 example:

sudo /cuda/cudatoolkit_2.1_linux32_ubuntu8.04.run

x86_64 example:

sudo /cuda/cudatoolkit_2.1_linux64_ubuntu8.04.run

Accept the defaults. At the end of the installation, you'll see the following notices:

* Please make sure your PATH includes /usr/local/cuda/bin
* Please make sure your LD_LIBRARY_PATH includes /usr/local/cuda/lib

Do so - to change them permanently, edit your .profile to include the following lines:

PATH=/usr/local/cuda/bin:$PATH
LD_LIBRARY_PATH=/usr/local/cuda/lib:$LD_LIBRARY_PATH
export PATH LD_LIBRARY_PATH
Installing the CUDA SDK and code samples

In order to give each user the ability to build into and modify code samples provided by the SDK, we recommend installing a copy in each user's home directory for each architecture used for CUDA work. Run it as yourself (without using sudo):

cd ~
/cuda/NVIDIA_CUDA_sdk_2.0beta2_linux.run

The installer will prompt for the toolkit location: /usr/local/cuda. It will also what you want the path to be for the SDK you're setting up; if you'll be using 32-bit and 64-bit CUDA machines, you'll need to maintain two copies of the SDK, so we recommend that you name them appropriately (i.e. ~/NVIDIA_CUDA_SDK-x86 and ~NVIDIA_CUDA_SDK-x86_64).

Run "make" within the "common" directory of the SDK(s) you install in your home directory. Among other things, this builds libcutil.a into the "lib" directory where you installed the SDK and will allow you to build code examples.

: /af13/jlg9n/NVIDIA_CUDA_SDK ; cd common
: /af13/jlg9n/NVIDIA_CUDA_SDK/common ; make

To test the SDK, build the "scan" utility, which will allow you to test your SDK:

: /af13/jlg9n/NVIDIA_CUDA_SDK ; cd projects/scan
: /af13/jlg9n/NVIDIA_CUDA_SDK ; make

Then, cd into the root of the SDK(s) you've installed and run the scan test:

: /af13/jlg9n/NVIDIA_CUDA_SDK ; bin/linux/release/scan

You should see all three scan tests pass:

scan_naive: Test PASSED
scan_workefficient: Test PASSED
scan_best: Test PASSED

If not, send email to root with the output of scan and the path to the copy of the SDK you're using.

Notes

As new versions of the driver, toolkit and SDK become available, systems staff can easily add them to the /cuda directory on all of the boxes - just send email to root with the pointer to the new installers, and we'll copy them into the master directory.

You may need to tweak your Xorg configuration to set up the display settings you prefer; the CUDA_USERS sudo group is authorized to run "/usr/bin/nvidia-xsettings", "/usr/bin/nvidia-settings", and "/usr/bin/sudoedit /etc/X11/xorg.conf" via sudo.

For your convenience, the CUDA_USERS sudo group is also authorized to run "/sbin/shutdown" and "/sbin/reboot" for remote boot management.

Troubleshooting

Libcudart errors

Problem: When executing a CUDA program under Linux, you get the following error:

error while loading shared libraries: libcudart.so.2: cannot open shared object file: No such file or directory

Solution: You need to add the path to the CUDA libraries to your $LD_LIBRARY_PATH environment variable.

Explanation: When a CUDA program is executed, it needs to dynamically link to the CUDA runtime libraries. By default, these libraries are located in the /usr/local/cuda/lib directory. When searching for these libraries, the operating system looks in directories specified in the $LD_LIBRARY_PATH environment variable. If the CUDA library directory is not specified here, the program will fail with the error shown above.

There are two solutions (these assume that you are using the bash shell, which is the default CS Department shell):

1. Run the following command:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib

This change is not persistent and will need to be re-run each time you log in.

2. Edit your .profile file (located at ~/.profile). Find the line that sets the $LD_LIBRARY_PATH variable, which should look similar to the following:

LD_LIBRARY_PATH="/usr/lib:/usr/openwin/lib:/usr/dt/lib:/X11.6/lib:/X11.5/lib:/uva/lib:/gnu/lib"

Modify that line to add the path to the CUDA libraries:

LD_LIBRARY_PATH="/usr/lib:/usr/openwin/lib:/usr/dt/lib:/X11.6/lib:/X11.5/lib:/uva/lib:/gnu/lib:/usr/local/cuda/lib"

After editing the file, you either need to log out and log back in or run the following command:

source ~/.profile

This solution is persistent and only needs to be performed once.

Advanced Topics

Enabling double-precision

By default, CUDA only supports single-precision floating point arithmetic. You must explicitly enable double-precision support.

Choosing among multiple GPUs

When running a CUDA program on a computer with multiple GPUs, your program may not be executed on the most powerful GPU. To maximize performance, you should modify your program to automatically choose the best GPU.

Measuring kernel runtime

A naive approach to measuring the runtime of a CUDA kernel may produce wildly inaccurate results. Make sure that you measure runtime correctly.

Links