CUDA notes

Here are some of my notes on writing and debugging CUDA code.

 

  • getting error messages in CUDA
  • accessing global variables in device functions
  • set which GPU devices to run on with

  • “dynamic initialization not supported for __device__ variable” error
  • device variable of class with data member class
  • debugging kernel launch errors with cuda-gdb

 

Credits to my office mates Ajay and Changwan, who taught me many things on GPU programming.

 

  • getting error messages in CUDA

When a kernel function (global) failed to launch from the host side, the program won’t crash, unless you explicitly call cudaGetLastError(). It seems to be a good practice to check for error codes after each launch.

  • accessing global variables in device functions

We need to make a device variable (pointer) if we want to access some global variable in the a __device__ function. We can use cudaMemcpyToSymbol to update the value of this variables from host side variables before we call the __device__ functions. 

Note cudaMemcpyToSymbol requires the right size of the variable. If it is an integer, has to be 4 bytes, it would return an error if 8 bytes is supplied. Apparently you need this to assign to “__constant__” variables as well. 

  • set which GPU devices to run on with

set the CUDA_VISIBLE_DEVICES=device_num env variable in the command

For example

CUDA_VISIBLE_DEVICES=1 ./cuda_executable

This overrides any API in the source code, I think.

https://devblogs.nvidia.com/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/

 

  • “dynamic initialization not supported for __device__ variable” error

For global variables that are on __device__, they cannot have a constructor that performs some operations (such as assigning values to members in the class ). There’s no way to automatically perform the initialization / assignments on GPU side.

  • device variable of class with data member class

Sometimes we get “dynamic initialization” error for global __device__ class with empty constructor. The problem was that the compiler thinks it needs to invoke a constructor to initialize a data member class, even though the member class has no constructor as well.

We got around it with explicitly assign it a value ({0} to make sure the constructor is not invoked

class1 {

class2 = {0};

}

 

  • debugging kernel launch errors with cuda-gdb

You can use cuda-gdb to locate the specific kernels that are failing

cuda-gdb –args ./executable 

in the gdb command line, type the following

set cuda api_failures stop

And you can use

bt

to locate the specific kernel launch that is failing.

A more complete tutorial is here

https://docs.nvidia.com/cuda/cuda-gdb/index.html

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s