1// Copyright (c) 2020-2021 NVIDIA Corporation 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5[[cuda-modules]] 6== CUDA Modules 7 8 9[[cuda-modules-creation]] 10=== Creating a CUDA Module 11 12[open,refpage='VkCudaModuleNV',desc='Opaque handle to a CUDA module object',type='handles'] 13-- 14CUDA modules must: contain some kernel code and must: expose at least one 15function entry point. 16 17CUDA modules are represented by sname:VkCudaModuleNV handles: 18 19include::{generated}/api/handles/VkCudaModuleNV.adoc[] 20-- 21 22[open,refpage='vkCreateCudaModuleNV',desc='Creates a new CUDA module object',type='protos'] 23-- 24To create a CUDA module, call: 25 26include::{generated}/api/protos/vkCreateCudaModuleNV.adoc[] 27 28 * pname:device is the logical device that creates the shader module. 29 * pname:pCreateInfo is a pointer to a slink:VkCudaModuleCreateInfoNV 30 structure. 31 * pname:pAllocator controls host memory allocation as described in the 32 <<memory-allocation, Memory Allocation>> chapter. 33 * pname:pModule is a pointer to a slink:VkCudaModuleNV handle in which the 34 resulting CUDA module object is returned. 35 36Once a CUDA module has been created, you may: create the function entry 37point that must: refer to one function in the module. 38 39include::{generated}/validity/protos/vkCreateCudaModuleNV.adoc[] 40 41-- 42 43[open,refpage='VkCudaModuleCreateInfoNV',desc='Structure specifying the parameters to create a CUDA Module',type='structs'] 44-- 45The sname:VkCudaModuleCreateInfoNV structure is defined as: 46 47include::{generated}/api/structs/VkCudaModuleCreateInfoNV.adoc[] 48 49 * pname:sType is a elink:VkStructureType value identifying this structure. 50 * pname:pNext may: be `NULL` or may: be a pointer to a structure extending 51 this structure. 52 * pname:dataSize is the length of the pname:pData array. 53 * pname:pData is a pointer to CUDA code 54 55.Valid Usage 56**** 57 * [[VUID-VkCudaModuleCreateInfoNV-dataSize-09413]] 58 pname:dataSize must: be the total size in bytes of the PTX files or 59 binary cache passed to pname:pData. 60**** 61 62include::{generated}/validity/structs/VkCudaModuleCreateInfoNV.adoc[] 63-- 64 65 66[[cuda-function-creation]] 67=== Creating a CUDA Function Handle 68 69[open,refpage='VkCudaFunctionNV',desc='Opaque handle to a CUDA function object',type='handles'] 70-- 71CUDA functions are represented by sname:VkCudaFunctionNV handles. 72Handles to `__global__` functions may: then be used to issue a kernel launch 73(i.e. dispatch) from a commandbuffer. 74See <<cudadispatch, Dispatching Command for CUDA PTX kernel>>. 75 76include::{generated}/api/handles/VkCudaFunctionNV.adoc[] 77-- 78 79[open,refpage='vkCreateCudaFunctionNV',desc='Creates a new CUDA function object',type='protos'] 80-- 81To create a CUDA function, call: 82 83include::{generated}/api/protos/vkCreateCudaFunctionNV.adoc[] 84 85 * pname:device is the logical device that creates the shader module. 86 * pname:pCreateInfo is a pointer to a slink:VkCudaFunctionCreateInfoNV 87 structure. 88 * pname:pAllocator controls host memory allocation as described in the 89 <<memory-allocation, Memory Allocation>> chapter. 90 * pname:pFunction is a pointer to a slink:VkCudaFunctionNV handle in which 91 the resulting CUDA function object is returned. 92 93include::{generated}/validity/protos/vkCreateCudaFunctionNV.adoc[] 94-- 95 96[open,refpage='VkCudaFunctionCreateInfoNV',desc='Structure specifying the parameters to create a CUDA Function',type='structs'] 97-- 98The sname:VkCudaFunctionCreateInfoNV structure is defined as: 99 100include::{generated}/api/structs/VkCudaFunctionCreateInfoNV.adoc[] 101 102 * pname:sType is a elink:VkStructureType value identifying this structure. 103 * pname:pNext may: be `NULL` or may: be a pointer to a structure extending 104 this structure. 105 * pname:module must: be the CUDA slink:VkCudaModuleNV module in which the 106 function resides. 107 * pname:pName is a null-terminated UTF-8 string containing the name of the 108 shader entry point for this stage. 109 110include::{generated}/validity/structs/VkCudaFunctionCreateInfoNV.adoc[] 111-- 112 113 114[[cuda-function-destruction]] 115=== Destroying a CUDA Function 116 117[open,refpage='vkDestroyCudaFunctionNV',desc='Destroy a CUDA function',type='protos'] 118-- 119To destroy a CUDA function handle, call: 120 121include::{generated}/api/protos/vkDestroyCudaFunctionNV.adoc[] 122 123 * pname:device is the logical device that destroys the Function. 124 * pname:function is the handle of the CUDA function to destroy. 125 * pname:pAllocator controls host memory allocation as described in the 126 <<memory-allocation, Memory Allocation>> chapter. 127 128include::{generated}/validity/protos/vkDestroyCudaFunctionNV.adoc[] 129-- 130 131 132[[cuda-modules-destruction]] 133=== Destroying a CUDA Module 134 135[open,refpage='vkDestroyCudaModuleNV',desc='Destroy a CUDA module',type='protos'] 136-- 137To destroy a CUDA shader module, call: 138 139include::{generated}/api/protos/vkDestroyCudaModuleNV.adoc[] 140 141 * pname:device is the logical device that destroys the shader module. 142 * pname:module is the handle of the CUDA module to destroy. 143 * pname:pAllocator controls host memory allocation as described in the 144 <<memory-allocation, Memory Allocation>> chapter. 145 146include::{generated}/validity/protos/vkDestroyCudaModuleNV.adoc[] 147-- 148 149 150[[cuda-modules-getcache]] 151=== Reading back CUDA Module Cache 152 153After uploading the PTX kernel code, the module compiles the code to 154generate a binary cache with all the necessary information for the device to 155execute it. 156It is possible to read back this cache for later use, such as accelerating 157the initialization of further executions. 158 159[open,refpage='vkGetCudaModuleCacheNV',desc='Get CUDA module cache',type='protos'] 160-- 161To get the CUDA module cache call: 162 163include::{generated}/api/protos/vkGetCudaModuleCacheNV.adoc[] 164 165 * pname:device is the logical device that destroys the Function. 166 * pname:module is the CUDA module. 167 * pname:pCacheSize is a pointer containing the amount of bytes to be 168 copied in pname:pCacheData 169 * pname:pCacheData is a pointer to a buffer in which to copy the binary 170 cache 171 172.Valid Usage 173**** 174 * [[VUID-vkGetCudaModuleCacheNV-pCacheSize-09414]] 175 pname:pCacheSize must: be a pointer containing the amount of bytes to be 176 copied in pname:pCacheData. 177 If pname:pCacheData is NULL, the function will return in this pointer 178 the total amount of bytes required to later perform the copy into 179 pname:pCacheData. 180 * [[VUID-vkGetCudaModuleCacheNV-pCacheData-09415]] 181 pname:pCacheData may: be a pointer to a buffer in which the binary cache 182 will be copied. 183 The amount of bytes copied is defined by the value in pname:pCacheSize. 184 This pointer may: be NULL. 185 In this case, the function will write the total amount of required data 186 in pname:pCacheSize. 187**** 188 189include::{generated}/validity/protos/vkGetCudaModuleCacheNV.adoc[] 190-- 191 192A typical use of vkGetCudaModuleCacheNV happens in two steps: 193 194 * First call it without with pname:pCacheData set to NULL and with a valid 195 pointer pname:pCacheSize. 196 * Another call with a valid pname:pCacheData pointing to the expected size 197 returned by pname:pCacheSize, and pname:pCacheSize containing the amount 198 in bytes to copy. 199 200The returned cache may: then be used later for further initialization of the 201CUDA module, by sending this cache _instead_ of the PTX code, when using 202flink:vkCreateCudaModuleNV. 203 204Using the binary cache instead of the original PTX code should: 205significantly speed up initialization of the CUDA module, given that the 206whole compilation and validation will not be necessary. 207 208As with slink:VkPipelineCache, the binary cache depends on the specific 209implementation. 210Therefore the application must: assume the cache upload might fail in many 211circumstances and thus may: have to get ready for falling back to the 212original PTX code if necessary. 213Most often, the cache may: succeed if the same device driver and 214architecture is used between the cache generation from PTX and the use of 215this cache. 216But most of the time, in the event of a new driver version or a if using a 217different device architecture, this cache may: become invalid. 218 219 220[[cuda-modules-limitations]] 221=== Limitations 222 223CUDA and Vulkan do not use the device in the same configuration, therefore, 224few limitations must be taken into account: 225 226 * It is not possible to read or write global parameters from Vulkan. 227 The only way to share resources or send values to the PTX kernel is to 228 pass them as arguments of the function. 229 See <<cudadispatch_sharing_resources, Resources sharing between CUDA 230 Kernel and Vulkan>> for more details. 231 * No calls to functions external to the module PTX are supported 232 * Vulkan disables some shader/kernel exceptions, which could break CUDA 233 kernels relying on exceptions 234 * CUDA kernels submitted to Vulkan are limited to the amount of shared 235 memory you can query from physical capabilities. 236 It may be less than what CUDA can offer 237 * CUDA instruction-level preemption (CILP) does not work 238 * CUDA Unified Memory will not work in this extension. 239 * CUDA Dynamic parallelism is not supported 240 * DispatchIndirect not available 241