
In particular, when the total threads in the x-dimension ( gridDim.x*blockDim.x) is less than the size of the array I wish to process, then it's common practice to create a loop and have the grid of threads move through the entire array. In the CUDA documentation, these variables are defined here It's common practice when handling 1-D data to only create 1-D blocks and grids. blockDim.x * gridDim.x gives the number of threads in a grid (in the x direction, in this case)īlock and grid variables can be 1, 2, or 3 dimensional.gridDim.x,y,z gives the number of blocks in a grid, in the.blockDim.x,y,z gives the number of threads in a block, in the.Let us look into couple of scenarios and how global index is calculated in each case. The global index will help us access individual thread among millions of threads that are dispatched to the GPU. With the help of these four variables, we can calculate unique global index of each thread. Refers to the maximum number of blocks in a grid in all the dimension and it starts from 1.All thread blocks have the same dimension.Refers to the maximum number of threads in a block in all the dimension and it starts from 1.Refers to the block ID in a grid and it starts from 0.


So, if number of threads in X dim in a block is 32, then threadIdx.x ranges from 0 to 31 in each block.


CUDA DIM3 EXAMPLE SOFTWARE
From a software perspective the block and grid variables are three dimensional. For sake of simplicity we can name total number of threads in a block as block and total number of blocks in a grid as grid. Total number of threads in all the blocks remain the same.
CUDA DIM3 EXAMPLE CODE
Thus in the above code total number of threads in a block is 1 and there is 1 such block in a grid. The first parameter indicates the total number of blocks in a grid and the second parameter indicates the total number of threads in a block. In the above code, to launch the CUDA kernel two 1's are initialised between the angle brackets. Pre-processor directives #include #include "cuda_runtime.h" #include "device_launch_parameters.h" //Device code _global_ void cuda_kernel() I’ll consider the same Hello World! code considered in the previous article. In order to launch a CUDA kernel we need to specify the block dimension and the grid dimension from the host code.
