Automated Dynamic Analysis of CUDA Programs

Recent increases in the programmability and performance of GPUs have led to a surge of interest in utilizing them for general-purpose computations. Tools such as NVIDIA’s CUDA allow programmers to use a C-like language to code algorithms for execution on the GPU. Unfortunately, parallel programs are prone to subtle correctness and performance bugs, and CUDA tool support for solving these remains a work in progress.

As a first step towards addressing these problems, we have developed a tool for finding two specific classes of bugs in CUDA programs: race conditions, which impact program correctness, and shared memory bank conflicts, which impact program performance. Our tool automatically instruments a program's source code in two ways: to keep track of the memory locations accessed by different threads, and to use this data to determine whether bugs exist in the program. The instrumented code can be run directly in CUDA's device emulation mode, and any potential errors discovered will be automatically reported to the user.

For more information, see our STMCS paper.

The the tool is available for download here.

Please see the readme and license.