1// Copyright (c) 2020-2021 NVIDIA Corporation
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5[[cuda-modules]]
6== CUDA Modules
7
8
9[[cuda-modules-creation]]
10=== Creating a CUDA Module
11
12[open,refpage='VkCudaModuleNV',desc='Opaque handle to a CUDA module object',type='handles']
13--
14CUDA modules must: contain some kernel code and must: expose at least one
15function entry point.
16
17CUDA modules are represented by sname:VkCudaModuleNV handles:
18
19include::{generated}/api/handles/VkCudaModuleNV.adoc[]
20--
21
22[open,refpage='vkCreateCudaModuleNV',desc='Creates a new CUDA module object',type='protos']
23--
24To create a CUDA module, call:
25
26include::{generated}/api/protos/vkCreateCudaModuleNV.adoc[]
27
28  * pname:device is the logical device that creates the shader module.
29  * pname:pCreateInfo is a pointer to a slink:VkCudaModuleCreateInfoNV
30    structure.
31  * pname:pAllocator controls host memory allocation as described in the
32    <<memory-allocation, Memory Allocation>> chapter.
33  * pname:pModule is a pointer to a slink:VkCudaModuleNV handle in which the
34    resulting CUDA module object is returned.
35
36Once a CUDA module has been created, you may: create the function entry
37point that must: refer to one function in the module.
38
39include::{generated}/validity/protos/vkCreateCudaModuleNV.adoc[]
40
41--
42
43[open,refpage='VkCudaModuleCreateInfoNV',desc='Structure specifying the parameters to create a CUDA Module',type='structs']
44--
45The sname:VkCudaModuleCreateInfoNV structure is defined as:
46
47include::{generated}/api/structs/VkCudaModuleCreateInfoNV.adoc[]
48
49  * pname:sType is a elink:VkStructureType value identifying this structure.
50  * pname:pNext may: be `NULL` or may: be a pointer to a structure extending
51    this structure.
52  * pname:dataSize is the length of the pname:pData array.
53  * pname:pData is a pointer to CUDA code
54
55.Valid Usage
56****
57  * [[VUID-VkCudaModuleCreateInfoNV-dataSize-09413]]
58    pname:dataSize must: be the total size in bytes of the PTX files or
59    binary cache passed to pname:pData.
60****
61
62include::{generated}/validity/structs/VkCudaModuleCreateInfoNV.adoc[]
63--
64
65
66[[cuda-function-creation]]
67=== Creating a CUDA Function Handle
68
69[open,refpage='VkCudaFunctionNV',desc='Opaque handle to a CUDA function object',type='handles']
70--
71CUDA functions are represented by sname:VkCudaFunctionNV handles.
72Handles to `__global__` functions may: then be used to issue a kernel launch
73(i.e. dispatch) from a commandbuffer.
74See <<cudadispatch, Dispatching Command for CUDA PTX kernel>>.
75
76include::{generated}/api/handles/VkCudaFunctionNV.adoc[]
77--
78
79[open,refpage='vkCreateCudaFunctionNV',desc='Creates a new CUDA function object',type='protos']
80--
81To create a CUDA function, call:
82
83include::{generated}/api/protos/vkCreateCudaFunctionNV.adoc[]
84
85  * pname:device is the logical device that creates the shader module.
86  * pname:pCreateInfo is a pointer to a slink:VkCudaFunctionCreateInfoNV
87    structure.
88  * pname:pAllocator controls host memory allocation as described in the
89    <<memory-allocation, Memory Allocation>> chapter.
90  * pname:pFunction is a pointer to a slink:VkCudaFunctionNV handle in which
91    the resulting CUDA function object is returned.
92
93include::{generated}/validity/protos/vkCreateCudaFunctionNV.adoc[]
94--
95
96[open,refpage='VkCudaFunctionCreateInfoNV',desc='Structure specifying the parameters to create a CUDA Function',type='structs']
97--
98The sname:VkCudaFunctionCreateInfoNV structure is defined as:
99
100include::{generated}/api/structs/VkCudaFunctionCreateInfoNV.adoc[]
101
102  * pname:sType is a elink:VkStructureType value identifying this structure.
103  * pname:pNext may: be `NULL` or may: be a pointer to a structure extending
104    this structure.
105  * pname:module must: be the CUDA slink:VkCudaModuleNV module in which the
106    function resides.
107  * pname:pName is a null-terminated UTF-8 string containing the name of the
108    shader entry point for this stage.
109
110include::{generated}/validity/structs/VkCudaFunctionCreateInfoNV.adoc[]
111--
112
113
114[[cuda-function-destruction]]
115=== Destroying a CUDA Function
116
117[open,refpage='vkDestroyCudaFunctionNV',desc='Destroy a CUDA function',type='protos']
118--
119To destroy a CUDA function handle, call:
120
121include::{generated}/api/protos/vkDestroyCudaFunctionNV.adoc[]
122
123  * pname:device is the logical device that destroys the Function.
124  * pname:function is the handle of the CUDA function to destroy.
125  * pname:pAllocator controls host memory allocation as described in the
126    <<memory-allocation, Memory Allocation>> chapter.
127
128include::{generated}/validity/protos/vkDestroyCudaFunctionNV.adoc[]
129--
130
131
132[[cuda-modules-destruction]]
133=== Destroying a CUDA Module
134
135[open,refpage='vkDestroyCudaModuleNV',desc='Destroy a CUDA module',type='protos']
136--
137To destroy a CUDA shader module, call:
138
139include::{generated}/api/protos/vkDestroyCudaModuleNV.adoc[]
140
141  * pname:device is the logical device that destroys the shader module.
142  * pname:module is the handle of the CUDA module to destroy.
143  * pname:pAllocator controls host memory allocation as described in the
144    <<memory-allocation, Memory Allocation>> chapter.
145
146include::{generated}/validity/protos/vkDestroyCudaModuleNV.adoc[]
147--
148
149
150[[cuda-modules-getcache]]
151=== Reading back CUDA Module Cache
152
153After uploading the PTX kernel code, the module compiles the code to
154generate a binary cache with all the necessary information for the device to
155execute it.
156It is possible to read back this cache for later use, such as accelerating
157the initialization of further executions.
158
159[open,refpage='vkGetCudaModuleCacheNV',desc='Get CUDA module cache',type='protos']
160--
161To get the CUDA module cache call:
162
163include::{generated}/api/protos/vkGetCudaModuleCacheNV.adoc[]
164
165  * pname:device is the logical device that destroys the Function.
166  * pname:module is the CUDA module.
167  * pname:pCacheSize is a pointer containing the amount of bytes to be
168    copied in pname:pCacheData
169  * pname:pCacheData is a pointer to a buffer in which to copy the binary
170    cache
171
172.Valid Usage
173****
174  * [[VUID-vkGetCudaModuleCacheNV-pCacheSize-09414]]
175    pname:pCacheSize must: be a pointer containing the amount of bytes to be
176    copied in pname:pCacheData.
177    If pname:pCacheData is NULL, the function will return in this pointer
178    the total amount of bytes required to later perform the copy into
179    pname:pCacheData.
180  * [[VUID-vkGetCudaModuleCacheNV-pCacheData-09415]]
181    pname:pCacheData may: be a pointer to a buffer in which the binary cache
182    will be copied.
183    The amount of bytes copied is defined by the value in pname:pCacheSize.
184    This pointer may: be NULL.
185    In this case, the function will write the total amount of required data
186    in pname:pCacheSize.
187****
188
189include::{generated}/validity/protos/vkGetCudaModuleCacheNV.adoc[]
190--
191
192A typical use of vkGetCudaModuleCacheNV happens in two steps:
193
194  * First call it without with pname:pCacheData set to NULL and with a valid
195    pointer pname:pCacheSize.
196  * Another call with a valid pname:pCacheData pointing to the expected size
197    returned by pname:pCacheSize, and pname:pCacheSize containing the amount
198    in bytes to copy.
199
200The returned cache may: then be used later for further initialization of the
201CUDA module, by sending this cache _instead_ of the PTX code, when using
202flink:vkCreateCudaModuleNV.
203
204Using the binary cache instead of the original PTX code should:
205significantly speed up initialization of the CUDA module, given that the
206whole compilation and validation will not be necessary.
207
208As with slink:VkPipelineCache, the binary cache depends on the specific
209implementation.
210Therefore the application must: assume the cache upload might fail in many
211circumstances and thus may: have to get ready for falling back to the
212original PTX code if necessary.
213Most often, the cache may: succeed if the same device driver and
214architecture is used between the cache generation from PTX and the use of
215this cache.
216But most of the time, in the event of a new driver version or a if using a
217different device architecture, this cache may: become invalid.
218
219
220[[cuda-modules-limitations]]
221=== Limitations
222
223CUDA and Vulkan do not use the device in the same configuration, therefore,
224few limitations must be taken into account:
225
226  * It is not possible to read or write global parameters from Vulkan.
227    The only way to share resources or send values to the PTX kernel is to
228    pass them as arguments of the function.
229    See <<cudadispatch_sharing_resources, Resources sharing between CUDA
230    Kernel and Vulkan>> for more details.
231  * No calls to functions external to the module PTX are supported
232  * Vulkan disables some shader/kernel exceptions, which could break CUDA
233    kernels relying on exceptions
234  * CUDA kernels submitted to Vulkan are limited to the amount of shared
235    memory you can query from physical capabilities.
236    It may be less than what CUDA can offer
237  * CUDA instruction-level preemption (CILP) does not work
238  * CUDA Unified Memory will not work in this extension.
239  * CUDA Dynamic parallelism is not supported
240  * DispatchIndirect not available
241