android-15.0.0_r3/s

// Copyright (c) 2020-2021 NVIDIA Corporation
//
// SPDX-License-Identifier: CC-BY-4.0

[[cuda-modules]]
== CUDA Modules


[[cuda-modules-creation]]
=== Creating a CUDA Module

[open,refpage='VkCudaModuleNV',desc='Opaque handle to a CUDA module object',type='handles']
--
CUDA modules must: contain some kernel code and must: expose at least one
function entry point.

CUDA modules are represented by sname:VkCudaModuleNV handles:

include::{generated}/api/handles/VkCudaModuleNV.adoc[]
--

[open,refpage='vkCreateCudaModuleNV',desc='Creates a new CUDA module object',type='protos']
--
To create a CUDA module, call:

include::{generated}/api/protos/vkCreateCudaModuleNV.adoc[]

  * pname:device is the logical device that creates the shader module.
  * pname:pCreateInfo is a pointer to a slink:VkCudaModuleCreateInfoNV
    structure.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.
  * pname:pModule is a pointer to a slink:VkCudaModuleNV handle in which the
    resulting CUDA module object is returned.

Once a CUDA module has been created, you may: create the function entry
point that must: refer to one function in the module.

include::{generated}/validity/protos/vkCreateCudaModuleNV.adoc[]

--

[open,refpage='VkCudaModuleCreateInfoNV',desc='Structure specifying the parameters to create a CUDA Module',type='structs']
--
The sname:VkCudaModuleCreateInfoNV structure is defined as:

include::{generated}/api/structs/VkCudaModuleCreateInfoNV.adoc[]

  * pname:sType is a elink:VkStructureType value identifying this structure.
  * pname:pNext may: be `NULL` or may: be a pointer to a structure extending
    this structure.
  * pname:dataSize is the length of the pname:pData array.
  * pname:pData is a pointer to CUDA code

.Valid Usage
****
  * [[VUID-VkCudaModuleCreateInfoNV-dataSize-09413]]
    pname:dataSize must: be the total size in bytes of the PTX files or
    binary cache passed to pname:pData.
****

include::{generated}/validity/structs/VkCudaModuleCreateInfoNV.adoc[]
--


[[cuda-function-creation]]
=== Creating a CUDA Function Handle

[open,refpage='VkCudaFunctionNV',desc='Opaque handle to a CUDA function object',type='handles']
--
CUDA functions are represented by sname:VkCudaFunctionNV handles.
Handles to `__global__` functions may: then be used to issue a kernel launch
(i.e. dispatch) from a commandbuffer.
See <<cudadispatch, Dispatching Command for CUDA PTX kernel>>.

include::{generated}/api/handles/VkCudaFunctionNV.adoc[]
--

[open,refpage='vkCreateCudaFunctionNV',desc='Creates a new CUDA function object',type='protos']
--
To create a CUDA function, call:

include::{generated}/api/protos/vkCreateCudaFunctionNV.adoc[]

  * pname:device is the logical device that creates the shader module.
  * pname:pCreateInfo is a pointer to a slink:VkCudaFunctionCreateInfoNV
    structure.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.
  * pname:pFunction is a pointer to a slink:VkCudaFunctionNV handle in which
    the resulting CUDA function object is returned.

include::{generated}/validity/protos/vkCreateCudaFunctionNV.adoc[]
--

[open,refpage='VkCudaFunctionCreateInfoNV',desc='Structure specifying the parameters to create a CUDA Function',type='structs']
--
The sname:VkCudaFunctionCreateInfoNV structure is defined as:

include::{generated}/api/structs/VkCudaFunctionCreateInfoNV.adoc[]

  * pname:sType is a elink:VkStructureType value identifying this structure.
  * pname:pNext may: be `NULL` or may: be a pointer to a structure extending
    this structure.
  * pname:module must: be the CUDA slink:VkCudaModuleNV module in which the
    function resides.
  * pname:pName is a null-terminated UTF-8 string containing the name of the
    shader entry point for this stage.

include::{generated}/validity/structs/VkCudaFunctionCreateInfoNV.adoc[]
--


[[cuda-function-destruction]]
=== Destroying a CUDA Function

[open,refpage='vkDestroyCudaFunctionNV',desc='Destroy a CUDA function',type='protos']
--
To destroy a CUDA function handle, call:

include::{generated}/api/protos/vkDestroyCudaFunctionNV.adoc[]

  * pname:device is the logical device that destroys the Function.
  * pname:function is the handle of the CUDA function to destroy.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.

include::{generated}/validity/protos/vkDestroyCudaFunctionNV.adoc[]
--


[[cuda-modules-destruction]]
=== Destroying a CUDA Module

[open,refpage='vkDestroyCudaModuleNV',desc='Destroy a CUDA module',type='protos']
--
To destroy a CUDA shader module, call:

include::{generated}/api/protos/vkDestroyCudaModuleNV.adoc[]

  * pname:device is the logical device that destroys the shader module.
  * pname:module is the handle of the CUDA module to destroy.
  * pname:pAllocator controls host memory allocation as described in the
    <<memory-allocation, Memory Allocation>> chapter.

include::{generated}/validity/protos/vkDestroyCudaModuleNV.adoc[]
--


[[cuda-modules-getcache]]
=== Reading back CUDA Module Cache

After uploading the PTX kernel code, the module compiles the code to
generate a binary cache with all the necessary information for the device to
execute it.
It is possible to read back this cache for later use, such as accelerating
the initialization of further executions.

[open,refpage='vkGetCudaModuleCacheNV',desc='Get CUDA module cache',type='protos']
--
To get the CUDA module cache call:

include::{generated}/api/protos/vkGetCudaModuleCacheNV.adoc[]

  * pname:device is the logical device that destroys the Function.
  * pname:module is the CUDA module.
  * pname:pCacheSize is a pointer containing the amount of bytes to be
    copied in pname:pCacheData
  * pname:pCacheData is a pointer to a buffer in which to copy the binary
    cache

.Valid Usage
****
  * [[VUID-vkGetCudaModuleCacheNV-pCacheSize-09414]]
    pname:pCacheSize must: be a pointer containing the amount of bytes to be
    copied in pname:pCacheData.
    If pname:pCacheData is NULL, the function will return in this pointer
    the total amount of bytes required to later perform the copy into
    pname:pCacheData.
  * [[VUID-vkGetCudaModuleCacheNV-pCacheData-09415]]
    pname:pCacheData may: be a pointer to a buffer in which the binary cache
    will be copied.
    The amount of bytes copied is defined by the value in pname:pCacheSize.
    This pointer may: be NULL.
    In this case, the function will write the total amount of required data
    in pname:pCacheSize.
****

include::{generated}/validity/protos/vkGetCudaModuleCacheNV.adoc[]
--

A typical use of vkGetCudaModuleCacheNV happens in two steps:

  * First call it without with pname:pCacheData set to NULL and with a valid
    pointer pname:pCacheSize.
  * Another call with a valid pname:pCacheData pointing to the expected size
    returned by pname:pCacheSize, and pname:pCacheSize containing the amount
    in bytes to copy.

The returned cache may: then be used later for further initialization of the
CUDA module, by sending this cache _instead_ of the PTX code, when using
flink:vkCreateCudaModuleNV.

Using the binary cache instead of the original PTX code should:
significantly speed up initialization of the CUDA module, given that the
whole compilation and validation will not be necessary.

As with slink:VkPipelineCache, the binary cache depends on the specific
implementation.
Therefore the application must: assume the cache upload might fail in many
circumstances and thus may: have to get ready for falling back to the
original PTX code if necessary.
Most often, the cache may: succeed if the same device driver and
architecture is used between the cache generation from PTX and the use of
this cache.
But most of the time, in the event of a new driver version or a if using a
different device architecture, this cache may: become invalid.


[[cuda-modules-limitations]]
=== Limitations

CUDA and Vulkan do not use the device in the same configuration, therefore,
few limitations must be taken into account:

  * It is not possible to read or write global parameters from Vulkan.
    The only way to share resources or send values to the PTX kernel is to
    pass them as arguments of the function.
    See <<cudadispatch_sharing_resources, Resources sharing between CUDA
    Kernel and Vulkan>> for more details.
  * No calls to functions external to the module PTX are supported
  * Vulkan disables some shader/kernel exceptions, which could break CUDA
    kernels relying on exceptions
  * CUDA kernels submitted to Vulkan are limited to the amount of shared
    memory you can query from physical capabilities.
    It may be less than what CUDA can offer
  * CUDA instruction-level preemption (CILP) does not work
  * CUDA Unified Memory will not work in this extension.
  * CUDA Dynamic parallelism is not supported
  * DispatchIndirect not available