1// Copyright (c) 2020-2021 NVIDIA Corporation
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5[[cudadispatch]]
6== Dispatch Command for CUDA PTX Kernels
7
8Compute kernels can: be provided in SPIR-V or PTX code.
9When using PTX kernels the dispatch mechanism is different than with regular
10compute pipelines.
11
12The way to create a PTX assembly file is beyond the scope of this
13documentation.
14For mode information, please refer to the CUDA toolkit documentation at
15https://docs.nvidia.com/cuda/.
16
17Prior to using this command, you must: initialize a CUDA module, and create
18a function handle that will serve as the entry point of the kernel to
19dispatch.
20See <<cuda-modules, CUDA Modules>>.
21
22The dispatching of a CUDA kernel is recorded into a command buffer, and when
23executed by a queue submit, will produce work which executes according to
24the bound compute pipeline.
25
26[open,refpage='vkCmdCudaLaunchKernelNV',desc='Dispatch compute work items',type='protos']
27--
28:refpage: vkCmdCudaLaunchKernelNV
29
30To record a CUDA kernel launch, call:
31
32include::{generated}/api/protos/vkCmdCudaLaunchKernelNV.adoc[]
33
34  * pname:commandBuffer is the command buffer into which the command will be
35    recorded.
36  * pname:pLaunchInfo is a pointer to a slink:VkCudaLaunchInfoNV structure
37    in which the grid (similar to workgroup) dimension, function handle and
38    related arguments are defined.
39
40When the command is executed, a global workgroup consisting of
41[eq]#pname:gridDimX {times} pname:gridDimY {times} pname:gridDimZ# local
42workgroups is assembled.
43
44include::{generated}/validity/protos/vkCmdCudaLaunchKernelNV.adoc[]
45--
46
47
48[[cudadispatch_info]]
49=== Passing Dispatch Parameters and Arguments
50
51[open,refpage='VkCudaLaunchInfoNV',desc='Structure specifying the parameters to launch a CUDA kernel',type='structs']
52--
53The sname:VkCudaLaunchInfoNV structure is very close to the parameters of
54the CUDA-Driver function
55link:++https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15++[cuLaunchKernel]
56documented in section
57link:++https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC++[6.19
58Execution Control] of CUDA Driver API.
59
60The structure is defined as:
61
62include::{generated}/api/structs/VkCudaLaunchInfoNV.adoc[]
63
64  * pname:sType is a elink:VkStructureType value identifying this structure.
65  * pname:pNext is `NULL` or a pointer to a structure extending this
66    structure.
67  * pname:function is the CUDA-Driver handle to the function being launched.
68  * pname:gridDimX is the number of local workgroups to dispatch in the X
69    dimension.
70    It must be less than or equal to
71    sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[0]
72  * pname:gridDimY is the number of local workgroups to dispatch in the Y
73    dimension.
74    It must be less than or equal to
75    sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[1]
76  * pname:gridDimZ is the number of local workgroups to dispatch in the Z
77    dimension.
78    It must be less than or equal to
79    sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[2]
80  * pname:blockDimX is block size in the X dimension.
81  * pname:blockDimY is block size in the Y dimension.
82  * pname:blockDimZ is block size in the Z dimension.
83  * pname:sharedMemBytes is the dynamic shared-memory size per thread block
84    in bytes.
85  * pname:paramCount is the length of the pname:pParams table.
86  * pname:pParams is a pointer to an array of pname:paramCount pointers,
87    corresponding to the arguments of pname:function.
88  * pname:extraCount is reserved for future use.
89  * pname:pExtras is reserved for future use.
90
91.Valid Usage
92****
93  * [[VUID-VkCudaLaunchInfoNV-gridDimX-09406]]
94    pname:gridDimX must: be less than or equal to
95    sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[0]
96  * [[VUID-VkCudaLaunchInfoNV-gridDimY-09407]]
97    pname:gridDimY must: be less than or equal to
98    sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[1]
99  * [[VUID-VkCudaLaunchInfoNV-gridDimZ-09408]]
100    pname:gridDimZ must: be less than or equal to
101    sname:VkPhysicalDeviceLimits::pname:maxComputeWorkGroupCount[2]
102  * [[VUID-VkCudaLaunchInfoNV-paramCount-09409]]
103    pname:paramCount must: be the total amount of parameters listed in the
104    pname:pParams table.
105  * [[VUID-VkCudaLaunchInfoNV-pParams-09410]]
106    pname:pParams must: be a pointer to a table of pname:paramCount
107    parameters, corresponding to the arguments of pname:function.
108  * [[VUID-VkCudaLaunchInfoNV-extraCount-09411]]
109    pname:extraCount must be 0
110  * [[VUID-VkCudaLaunchInfoNV-pExtras-09412]]
111    pname:pExtras must be NULL
112****
113
114include::{generated}/validity/structs/VkCudaLaunchInfoNV.adoc[]
115--
116
117Kernel parameters of pname:function are specified via pname:pParams, very
118much the same way as described in
119link:++https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15++[cuLaunchKernel]
120
121If pname:function has N parameters, then pname:pParams must: be an array of
122N pointers and pname:paramCount must: be set to N. Each of
123pname:kernelParams[0] through pname:kernelParams[N-1] must: point to a
124region of memory from which the actual kernel parameter will be copied.
125The number of kernel parameters and their offsets and sizes are not
126specified here as that information is stored in the slink:VkCudaFunctionNV
127object.
128
129The application-owned memory pointed to by pname:pParams and
130pname:kernelParams[0] through pname:kernelParams[N-1] are consumed
131immediately, and may: be altered or freed after
132flink:vkCmdCudaLaunchKernelNV has returned.
133[[cudadispatch_sharing_resources]]
134=== Resources sharing from Vulkan to the CUDA Kernel
135
136Given that one key limitation of this extension is that Vulkan cannot:
137access, nor bind any global resource of CUDA modules, the only way to
138exchange data with the kernel must: be to __pass resources via the arguments
139of the function__.
140
141ifdef::VK_KHR_buffer_device_address[]
142You can use apiext:VK_KHR_buffer_device_address to write/read to/from a
143slink:VkBuffer object.
144<<VK_KHR_buffer_device_address>> allows you to get the device address of the
145buffer to pass it as an argument into pname:pParams.
146Application-side pointer arithmetic on the device address is legal, but will
147not be bounds-checked on the device.
148
149The corresponding argument of the CUDA function should: be declared as a
150pointer of the same type as the referenced buffer.
151And the CUDA code may: simply read or write to this buffer in the typical C
152way.
153
154endif::VK_KHR_buffer_device_address[]
155
156ifdef::VK_NVX_image_view_handle[]
157You may: also use apiext:VK_NVX_image_view_handle as another convenient way
158to read/write from/to a slink:VkImage.
159
160The corresponding argument of the CUDA function must: be typed as
161`cudaSurfaceObject_t`.
162
163  * You may: then read from it by using the CUDA surface-read functions such
164    as `surf3Dread`/`surf2Dread`/`surf1Dread`
165  * You may: then write to it by using the CUDA surface-write functions such
166    as `surf3Dwrite`/`surf2Dwrite`/`surf1Dwrite`
167
168Please refer to CUDA
169link:https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html%23surface-object-api-appendix[surface
170object] documentation for more details
171
172On Vulkan side, here is an example on how to setup
173slink:VkImageViewHandleInfoNVX to query the handle for
174`cudaSurfaceObject_t`:
175
176[source,c++]
177----
178VkImageViewHandleInfoNVX imageViewHandleInfo = {VK_STRUCTURE_TYPE_IMAGE_VIEW_HANDLE_INFO_NVX};
179imageViewHandleInfo.sampler = VK_NULL_HANDLE;
180imageViewHandleInfo.descriptorType = VK_DESCRIPTOR_TYPE_STORAGE_IMAGE;
181imageViewHandleInfo.imageView = imageViewIn; // the VkImageView we want to access
182uint32_t myViewHandleIn = vkGetImageViewHandleNVX(m_device, &imageViewHandleInfo);
183imageViewHandleInfo.imageView = imageViewOut; // the VkImageView we want to access
184uint32_t myViewHandleOut = vkGetImageViewHandleNVX(m_device, &imageViewHandleInfo);
185----
186
187Here is an example of how to declare parameters for pname:pParams
188
189[source,c++]
190----
191VkCudaLaunchInfoNV launchInfo = { VK_STRUCTURE_TYPE_CUDA_LAUNCH_INFO_NV };
192
193int block_size = 8;
194float dt = 1.0f / 60.0f;
195
196const void* params[] =
197{
198  &dt,
199  &uint32_t myViewHandleIn,
200  &uint32_t myViewHandleOut
201};
202
203launchInfo.function = cudaFunction; // CUDA function previously created
204launchInfo.gridDimX = (volumeTexDimensionNonBoundary / block_size);
205launchInfo.gridDimY = (volumeTexDimensionNonBoundary / block_size);
206launchInfo.gridDimZ = (volumeTexDimensionNonBoundary / block_size);
207launchInfo.blockDimX = block_size;
208launchInfo.blockDimY = block_size;
209launchInfo.blockDimZ = block_size;
210launchInfo.sharedMemBytes = 0;
211launchInfo.paramCount = 3;
212launchInfo.pParams = &params[0];
213launchInfo.extraCount = 0;
214launchInfo.pExtras = nullptr;
215
216vkCmdCudaLaunchKernelNV(commandBuffer, &launchInfo);
217----
218
219In the CUDA kernel source code, here is an example on how arguments match
220pname:pParams and how we can use Surface object:
221
222[source,c++]
223----
224extern "C"  __global__ void cudaFunction(
225  float dt,
226  cudaSurfaceObject_t volumeTexIn,
227  cudaSurfaceObject_t volumeTexOut
228  )
229{
230  int i = 1 + blockIdx.x * blockDim.x + threadIdx.x;
231  int j = 1 + blockIdx.y * blockDim.y + threadIdx.y;
232  int k = 1 + blockIdx.z * blockDim.z + threadIdx.z;
233
234  float val;
235  surf3Dread(&val, volumeTexIn, i * sizeof(float), j, k);
236  ...
237  float result = ...;
238  // write result
239  surf3Dwrite(result, volumeTexOut, i * sizeof(float), j, k);
240}
241----
242
243endif::VK_NVX_image_view_handle[]
244