1// Copyright (c) 2020-2021 NVIDIA Corporation 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5[[cudadispatch]] 6== Dispatch Command for CUDA PTX Kernels 7 8Compute kernels can: be provided in SPIR-V or PTX code. 9When using PTX kernels the dispatch mechanism is different than with regular 10compute pipelines. 11 12The way to create a PTX assembly file is beyond the scope of this 13documentation. 14For mode information, please refer to the CUDA toolkit documentation at 15https://docs.nvidia.com/cuda/. 16 17Prior to using this command, you must: initialize a CUDA module, and create 18a function handle that will serve as the entry point of the kernel to 19dispatch. 20See <<cuda-modules, CUDA Modules>>. 21 22The dispatching of a CUDA kernel is recorded into a command buffer, and when 23executed by a queue submit, will produce work which executes according to 24the bound compute pipeline. 25 26[open,refpage='vkCmdCudaLaunchKernelNV',desc='Dispatch compute work items',type='protos'] 27-- 28:refpage: vkCmdCudaLaunchKernelNV 29 30To record a CUDA kernel launch, call: 31 32include::{generated}/api/protos/vkCmdCudaLaunchKernelNV.adoc[] 33 34 * pname:commandBuffer is the command buffer into which the command will be 35 recorded. 36 * pname:pLaunchInfo is a pointer to a slink:VkCudaLaunchInfoNV structure 37 in which the grid (similar to workgroup) dimension, function handle and 38 related arguments are defined. 39 40When the command is executed, a global workgroup consisting of 41[eq]#pname:gridDimX {times} pname:gridDimY {times} pname:gridDimZ# local 42workgroups is assembled. 43 44include::{generated}/validity/protos/vkCmdCudaLaunchKernelNV.adoc[] 45-- 46 47 48[[cudadispatch_info]] 49=== Passing Dispatch Parameters and Arguments 50 51[open,refpage='VkCudaLaunchInfoNV',desc='Structure specifying the parameters to launch a CUDA kernel',type='structs'] 52-- 53The sname:VkCudaLaunchInfoNV structure is very close to the parameters of 54the CUDA-Driver function 55link:++https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15++[cuLaunchKernel] 56documented in section 57link:++https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC++[6.19 58Execution Control] of CUDA Driver API. 59 60The structure is defined as: 61 62include::{generated}/api/structs/VkCudaLaunchInfoNV.adoc[] 63 64 * pname:sType is a elink:VkStructureType value identifying this structure. 65 * pname:pNext is `NULL` or a pointer to a structure extending this 66 structure. 67 * pname:function is the CUDA-Driver handle to the function being launched. 68 * pname:gridDimX is the number of local workgroups to dispatch in the X 69 dimension. 70 It must be less than or equal to 71 sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[0] 72 * pname:gridDimY is the number of local workgroups to dispatch in the Y 73 dimension. 74 It must be less than or equal to 75 sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[1] 76 * pname:gridDimZ is the number of local workgroups to dispatch in the Z 77 dimension. 78 It must be less than or equal to 79 sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[2] 80 * pname:blockDimX is block size in the X dimension. 81 * pname:blockDimY is block size in the Y dimension. 82 * pname:blockDimZ is block size in the Z dimension. 83 * pname:sharedMemBytes is the dynamic shared-memory size per thread block 84 in bytes. 85 * pname:paramCount is the length of the pname:pParams table. 86 * pname:pParams is a pointer to an array of pname:paramCount pointers, 87 corresponding to the arguments of pname:function. 88 * pname:extraCount is reserved for future use. 89 * pname:pExtras is reserved for future use. 90 91.Valid Usage 92**** 93 * [[VUID-VkCudaLaunchInfoNV-gridDimX-09406]] 94 pname:gridDimX must: be less than or equal to 95 sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[0] 96 * [[VUID-VkCudaLaunchInfoNV-gridDimY-09407]] 97 pname:gridDimY must: be less than or equal to 98 sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[1] 99 * [[VUID-VkCudaLaunchInfoNV-gridDimZ-09408]] 100 pname:gridDimZ must: be less than or equal to 101 sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[2] 102 * [[VUID-VkCudaLaunchInfoNV-paramCount-09409]] 103 pname:paramCount must: be the total amount of parameters listed in the 104 pname:pParams table. 105 * [[VUID-VkCudaLaunchInfoNV-pParams-09410]] 106 pname:pParams must: be a pointer to a table of pname:paramCount 107 parameters, corresponding to the arguments of pname:function. 108 * [[VUID-VkCudaLaunchInfoNV-extraCount-09411]] 109 pname:extraCount must be 0 110 * [[VUID-VkCudaLaunchInfoNV-pExtras-09412]] 111 pname:pExtras must be NULL 112**** 113 114include::{generated}/validity/structs/VkCudaLaunchInfoNV.adoc[] 115-- 116 117Kernel parameters of pname:function are specified via pname:pParams, very 118much the same way as described in 119link:++https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15++[cuLaunchKernel] 120 121If pname:function has N parameters, then pname:pParams must: be an array of 122N pointers and pname:paramCount must: be set to N. Each of 123pname:kernelParams[0] through pname:kernelParams[N-1] must: point to a 124region of memory from which the actual kernel parameter will be copied. 125The number of kernel parameters and their offsets and sizes are not 126specified here as that information is stored in the slink:VkCudaFunctionNV 127object. 128 129The application-owned memory pointed to by pname:pParams and 130pname:kernelParams[0] through pname:kernelParams[N-1] are consumed 131immediately, and may: be altered or freed after 132flink:vkCmdCudaLaunchKernelNV has returned. 133[[cudadispatch_sharing_resources]] 134=== Resources sharing from Vulkan to the CUDA Kernel 135 136Given that one key limitation of this extension is that Vulkan cannot: 137access, nor bind any global resource of CUDA modules, the only way to 138exchange data with the kernel must: be to __pass resources via the arguments 139of the function__. 140 141ifdef::VK_KHR_buffer_device_address[] 142You can use apiext:VK_KHR_buffer_device_address to write/read to/from a 143slink:VkBuffer object. 144<<VK_KHR_buffer_device_address>> allows you to get the device address of the 145buffer to pass it as an argument into pname:pParams. 146Application-side pointer arithmetic on the device address is legal, but will 147not be bounds-checked on the device. 148 149The corresponding argument of the CUDA function should: be declared as a 150pointer of the same type as the referenced buffer. 151And the CUDA code may: simply read or write to this buffer in the typical C 152way. 153 154endif::VK_KHR_buffer_device_address[] 155 156ifdef::VK_NVX_image_view_handle[] 157You may: also use apiext:VK_NVX_image_view_handle as another convenient way 158to read/write from/to a slink:VkImage. 159 160The corresponding argument of the CUDA function must: be typed as 161`cudaSurfaceObject_t`. 162 163 * You may: then read from it by using the CUDA surface-read functions such 164 as `surf3Dread`/`surf2Dread`/`surf1Dread` 165 * You may: then write to it by using the CUDA surface-write functions such 166 as `surf3Dwrite`/`surf2Dwrite`/`surf1Dwrite` 167 168Please refer to CUDA 169link:https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html%23surface-object-api-appendix[surface 170object] documentation for more details 171 172On Vulkan side, here is an example on how to setup 173slink:VkImageViewHandleInfoNVX to query the handle for 174`cudaSurfaceObject_t`: 175 176[source,c++] 177---- 178VkImageViewHandleInfoNVX imageViewHandleInfo = {VK_STRUCTURE_TYPE_IMAGE_VIEW_HANDLE_INFO_NVX}; 179imageViewHandleInfo.sampler = VK_NULL_HANDLE; 180imageViewHandleInfo.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE; 181imageViewHandleInfo.imageView = imageViewIn; // the VkImageView we want to access 182uint32_t myViewHandleIn = vkGetImageViewHandleNVX(m_device, &imageViewHandleInfo); 183imageViewHandleInfo.imageView = imageViewOut; // the VkImageView we want to access 184uint32_t myViewHandleOut = vkGetImageViewHandleNVX(m_device, &imageViewHandleInfo); 185---- 186 187Here is an example of how to declare parameters for pname:pParams 188 189[source,c++] 190---- 191VkCudaLaunchInfoNV launchInfo = { VK_STRUCTURE_TYPE_CUDA_LAUNCH_INFO_NV }; 192 193int block_size = 8; 194float dt = 1.0f / 60.0f; 195 196const void* params[] = 197{ 198 &dt, 199 &uint32_t myViewHandleIn, 200 &uint32_t myViewHandleOut 201}; 202 203launchInfo.function = cudaFunction; // CUDA function previously created 204launchInfo.gridDimX = (volumeTexDimensionNonBoundary / block_size); 205launchInfo.gridDimY = (volumeTexDimensionNonBoundary / block_size); 206launchInfo.gridDimZ = (volumeTexDimensionNonBoundary / block_size); 207launchInfo.blockDimX = block_size; 208launchInfo.blockDimY = block_size; 209launchInfo.blockDimZ = block_size; 210launchInfo.sharedMemBytes = 0; 211launchInfo.paramCount = 3; 212launchInfo.pParams = ¶ms[0]; 213launchInfo.extraCount = 0; 214launchInfo.pExtras = nullptr; 215 216vkCmdCudaLaunchKernelNV(commandBuffer, &launchInfo); 217---- 218 219In the CUDA kernel source code, here is an example on how arguments match 220pname:pParams and how we can use Surface object: 221 222[source,c++] 223---- 224extern "C" __global__ void cudaFunction( 225 float dt, 226 cudaSurfaceObject_t volumeTexIn, 227 cudaSurfaceObject_t volumeTexOut 228 ) 229{ 230 int i = 1 + blockIdx.x * blockDim.x + threadIdx.x; 231 int j = 1 + blockIdx.y * blockDim.y + threadIdx.y; 232 int k = 1 + blockIdx.z * blockDim.z + threadIdx.z; 233 234 float val; 235 surf3Dread(&val, volumeTexIn, i * sizeof(float), j, k); 236 ... 237 float result = ...; 238 // write result 239 surf3Dwrite(result, volumeTexOut, i * sizeof(float), j, k); 240} 241---- 242 243endif::VK_NVX_image_view_handle[] 244