1// Copyright 2023 The Khronos Group, Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5= VK_EXT_shader_object
6:toc: left
7:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/
8:sectnums:
9
10This document describes the proposed design for a new extension which aims to comprehensively address problems the pipeline abstraction has created for both applications and implementations.
11
12== Problem Statement
13
14When Vulkan 1.0 and its precursor Mantle were originally developed the then-existing shader and state binding models of earlier APIs were beginning to show worrying limitations, both in terms of draw call scaling and driver complexity needed to support them. Application developers were being artificially constrained from accessing the full capabilities of GPUs, and many IHVs were forced to maintain rat's nests of driver code full of heavy-handed draw time state validation and hacky shader patching, all in the service of simplicity at the API level. IHVs were understandably highly motivated to move away from such API designs.
15
16Enter the new low-level APIs like Mantle and ultimately Vulkan. These APIs set out to reduce driver overhead by exposing lower-level abstractions that would hopefully avoid the need for the draw time state validation and shader patching that was so problematic for IHVs, and so detrimental to performance for applications.
17
18One of the most significant changes to this end was the new concept of pipelines, which promised to shift the burden of the shader state combinatorics out of drivers and into applications, ideally avoiding the need for driver-side draw time state validation and shader patching entirely. The thinking went that application developers would design or redesign their renderers with pipelines in mind, and in so doing they would naturally learn to accomplish their goals with fewer combinations of state.
19
20Implicit in such a design was an assumption that applications would be able to know and provide nearly all of this state upfront. A very limited set of dynamic states was specified for the few pieces of state that had effectively unbounded ranges of values, but otherwise even state that could have been fully dynamic on all implementations was required to be baked into the static pipeline objects. This, the thinking went, would benefit even those implementations where the state was internally dynamic by enabling new possibilities for optimization during shader compilation.
21
22Also implicit in the design of pipelines was an assumption that the driver overhead of the pipeline abstraction would either be negligible, or that it would at least always be outweighed by the performance savings at draw time when compared to earlier APIs. The possibility that either setting dozens of individual pieces of state each time a pipeline is bound or tracking which of those dozens of pieces of state had changed since the previous pipeline bind might cause some implementations to exhibit problematically high overhead at pipeline bind time does not seem to have been a central consideration.
23
24Many of these assumptions have since proven to be unrealistic.
25
26On the application side, many developers considering or implementing Vulkan and similar APIs found them unable to efficiently support important use cases which were easily supportable in earlier APIs. This has not been simply a matter of developers being stuck in an old way of thinking or unwilling to "rein in" an unnecessarily large number of state combinations, but a reflection of the reality that the natural design patterns of the most demanding class of applications which use graphics APIs -- video games -- are inherently and deeply dependent on the very "dynamism" that pipelines set out to constrain.
27
28As a result, renderers with a choice of API have largely chosen to avoid Vulkan and its "pipelined" contemporaries, while those without a choice have largely just devised workarounds to make these new APIs behave like the old ones -- usually in the form of the now nearly ubiquitous hash-n-cache pattern. These applications set various pieces of "pipeline" state independently, then hash it all at draw time and use the hash as a key into an application-managed pipeline cache, reusing an existing pipeline if it exists or creating and caching a new one if it does not. In effect, the messy and inefficient parts of GL drivers that pipelines sought to eliminate have simply moved into applications, except without the benefits of implementation specific knowledge which might have reduced their complexity or improved their performance.
29
30This is not just a problem of "legacy" application code where it might be viable for the API to wait it out until application codebases are rewritten or replaced. Applications need the features they need, and are unlikely to remove features they need just to satisfy what they know to be artificial limitations imposed by a graphics API's made-up abstraction. This is especially true for developers working on platforms where the pipeline API does not offer substantial performance benefits over other APIs that do not share the same limitations.
31
32On the driver side, pipelines have provided some of their desired benefits for some implementations, but for others they have largely just shifted draw time overhead to pipeline bind time (while in some cases still not entirely eliminating the draw time overhead in the first place). Implementations where nearly all "pipeline" state is internally dynamic are forced to either redundantly re-bind all of this state each time a pipeline is bound, or to track what state has changed from pipeline to pipeline -- either of which creates considerable overhead on CPU-constrained platforms.
33
34For certain implementations, the pipeline abstraction has also locked away a significant amount of the flexibility supported by their hardware, thereby paradoxically leaving many of their capabilities inaccessible in the newer and ostensibly "low level" API, though still accessible through older, high level ones. In effect, this is a return to the old problem of the graphics API artificially constraining applications from accessing the full capabilities of the GPU, only on a different axis.
35
36Finally, on fixed hardware platforms like game consoles and embedded systems pipelines have created some additional and unique challenges. These platforms tend to have limited CPU performance, memory, and storage capacity all at the same time. Because of this it is generally not desirable for applications on these platforms to waste storage space shipping both uncompiled SPIR-V and precompiled pipeline caches, however it is also not desirable to compile the same shaders from scratch on each system (even if they could be cached for subsequent runs). Also, the hardware and even driver versions on these systems are typically known in advance, and drivers might only ever change in tandem with applications. Vulkan applications on these systems are forced to waste precious storage space on not only shipping both SPIR-V and pipeline cached versions of their shaders, but on their pipeline caches containing potentially large numbers of slightly differently optimized permutations of the same shader code, with only minor differences in pipeline state (arguably this last point is a compression problem, but opaque pipeline caches mostly leave applications at the mercy of the driver to solve it for them).
37
38Fortunately, some of these problems have been acknowledged and various efforts have already begun to address several of them.
39
40These existing efforts have mainly chosen to tackle problems through the lens of existing hash-n-cache type application architectures, and have focused on those problems which are most acute at pipeline compile time. Their goals have included things like reducing pipeline counts, improving the usability and efficiency of pipeline caches, and introducing more granularity to the pipeline compilation and caching process. The extensions they have produced have preferred a targeted, piecemeal, and minimally invasive "band-aid" approach over a more holistic "rip off the band-aid" redesign.
41
42Such efforts have undoubtedly produced valuable improvements, but they have left the class of problems which manifest at bind time largely unaddressed. It might be possible to continue the existing piecemeal approach with a refocus onto bind time, but the solution space afforded by this kind of approach would necessarily remain constrained by the design decisions of the past.
43
44== Solution Space
45
46Several approaches are immediately apparent:
47
48 . Extend the existing graphics pipeline library concept somehow, perhaps by adding optional new, more granular library types and/or making pipeline binaries directly bindable without needing to be explicitly linked into a pipeline object
49 . Continue to expose more (maybe optional) dynamic state to minimize the number of pipeline objects needed
50 . Abandon pipelines entirely and introduce new functionality to compile and bind shaders directly
51
52Option 1 is a natural extension of recent efforts and requires relatively few API changes, but it adds even more complexity to the already very complex pipeline concept, while also failing to adequately address significant parts of the problem. While directly bindable pipeline libraries do reduce the dimensionality of pipeline combinatorics, they do not provide any meaningful absolute CPU performance improvement at pipeline bind time. The total overhead of binding N different pipeline libraries is still roughly on par with the overhead of binding a single (monolithic or linked) pipeline.
53
54Option 2 also requires relatively few API changes and would do more to address bind time CPU performance than option 1, but this option is limited in both the class of issues it can address and its portability across implementations. Much of the universally supportable "low hanging fruit" dynamic state has already been exposed by the existing extended dynamic state extensions, and the remaining state is mostly not universally dynamic. Exposing states A and B as dynamic on one implementation and states B and C on another is still valuable, but it limits this approach's benefits for simplifying application architectures. Even though this option is not a complete solution, it can and should be pursued in parallel with other efforts -- both for its own sake and as a potential foundation for more a comprehensive solution.
55
56Option 3 is more radical, but brings the API design more in line with developer expectations. The pipeline abstraction has been a consistent problem for many developers trying to use Vulkan since its inception, and this option can produce a cleaner, more user-friendly abstraction that bypasses the complexity of pipelines. With the benefit of years of hindsight and broader Working Group knowledge about the constraints of each others' implementations, it can aim to achieve a design which better balances API simplicity with adherence to the explicit design ethos of Vulkan.
57
58This proposal focuses on option 3, for the reasons outlined above.
59
60== Proposal
61
62=== Shaders
63
64This extension introduces a new object type `VkShaderEXT` which represents a single compiled shader stage. `VkShaderEXT` objects may be created either independently or linked with other `VkShaderEXT` objects created at the same time. To create `VkShaderEXT` objects, applications call `vkCreateShadersEXT()`:
65
66[source,c]
67----
68VkResult vkCreateShadersEXT(
69    VkDevice                                    device,
70    uint32_t                                    createInfoCount,
71    VkShaderCreateInfoEXT*                      pCreateInfos,
72    VkAllocationCallbacks*                      pAllocator,
73    VkShaderEXT*                                pShaders);
74----
75
76This function compiles the source code for one or more shader stages into `VkShaderEXT` objects. Whenever `createInfoCount` is greater than one, the shaders being created may optionally be linked together. Linking allows the implementation to perform cross-stage optimizations based on a promise by the application that the linked shaders will always be used together.
77
78Though a set of linked shaders may perform anywhere between the same to substantially better than equivalent unlinked shaders, this tradeoff is left to the application and linking is never mandatory.
79
80[source,c]
81----
82typedef enum VkShaderCreateFlagBitsEXT {
83    VK_SHADER_CREATE_LINK_STAGE_BIT_EXT = 0x00000001,
84    VK_SHADER_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT = 0x00000002,
85    VK_SHADER_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT = 0x00000004,
86    VK_SHADER_CREATE_NO_TASK_SHADER_BIT_EXT = 0x00000008,
87    VK_SHADER_CREATE_DISPATCH_BASE_BIT_EXT = 0x00000010,
88    VK_SHADER_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_EXT = 0x00000020,
89    VK_SHADER_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT = 0x00000040
90} VkShaderCreateFlagBitsEXT;
91typedef VkFlags VkShaderCreateFlagsEXT;
92
93typedef enum VkShaderCodeTypeEXT {
94    VK_SHADER_CODE_TYPE_BINARY_EXT = 0,
95    VK_SHADER_CODE_TYPE_SPIRV_EXT = 1
96} VkShaderCodeTypeEXT;
97
98typedef struct VkShaderCreateInfoEXT {
99    VkStructureType                             sType;
100    const void*                                 pNext;
101    VkShaderCreateFlagsEXT                      flags;
102    VkShaderStageFlagBits                       stage;
103    VkShaderStageFlags                          nextStage;
104    VkShaderCodeTypeEXT                         codeType;
105    size_t                                      codeSize;
106    const void*                                 pCode;
107    const char*                                 pName;
108    uint32_t                                    setLayoutCount;
109    const VkDescriptorSetLayout*                pSetLayouts;
110    uint32_t                                    pushConstantRangeCount;
111    const VkPushConstantRange*                  pPushConstantRanges;
112    const VkSpecializationInfo*                 pSpecializationInfo;
113} VkShaderCreateInfoEXT;
114----
115
116To specify that shaders should be linked, include the `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` flag in each of the `VkShaderCreateInfoEXT` structures passed to `vkCreateShadersEXT()`. The presence or absence of `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` must match across all `VkShaderCreateInfoEXT` structures passed to a single `vkCreateShadersEXT()` call: i.e., if any member of `pCreateInfos` includes `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` then all other members must include it too. `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` is ignored if `createInfoCount` is one, and a shader created this way is considered unlinked.
117
118The stage of the shader being compiled is specified by `stage`. Applications must also specify which stage types will be allowed to immediately follow the shader being created. For example, a vertex shader might specify a `nextStage` value of `VK_SHADER_STAGE_FRAGMENT_BIT` to indicate that the vertex shader being created will always be followed by a fragment shader (and never a geometry or tessellation shader). Applications that do not know this information at shader creation time or need the same shader to be compatible with multiple subsequent stages can specify a mask that includes as many valid next stages as they wish. For example, a vertex shader can specify a `nextStage` mask of `VK_SHADER_STAGE_GEOMETRY_BIT | VK_SHADER_STAGE_FRAGMENT_BIT` to indicate that the next stage could be either a geometry shader or fragment shader (but not a tessellation shader).
119
120[NOTE]
121====
122Certain implementations may incur a compile time and/or memory usage penalty whenever more than one stage bit is set in `nextStage`, so applications should strive to set the minimum number of bits they are able to. However, applications should *not* interpret this advice to mean that they should create multiple `VkShaderEXT` objects that differ only by the value of `nextStage`, as this will incur unnecessarily overhead on implementations where `nextStage` is ignored.
123====
124
125The shader code is pointed to by `pCode` and may be provided as SPIR-V, or in an opaque implementation defined binary form specific to the physical device. The format of the shader code is specified by `codeType`.
126
127The `codeType` of all `VkShaderCreateInfoEXT` structures passed to a `vkCreateShadersEXT()` call must match. This also means that only shaders created with the same `codeType` may be linked together.
128
129Descriptor set layouts and push constant ranges used by each shader are specified directly (not via a `VkPipelineLayout`), though multiple stages can of course point to the same structures.
130
131Any time after a `VkShaderEXT` object has been created, its binary shader code can be queried using `vkGetShaderBinaryDataEXT()`:
132
133[source,c]
134----
135VkResult vkGetShaderBinaryDataEXT(
136    VkDevice                                    device,
137    VkShaderEXT                                 shader,
138    size_t*                                     pDataSize,
139    void*                                       pData);
140----
141
142When `pData` is `NULL`, `size` is filled with the number of bytes needed to store the shader’s binary code and `VK_SUCCESS` is returned.
143
144When `pData` is non-`NULL`, `size` points to the application-provided size of `pData`. If the provided size is large enough then the location pointed to by `pData` is filled with the shader’s binary code and `VK_SUCCESS` is returned, otherwise nothing is written to `pData` and `VK_INCOMPLETE` is returned.
145
146The binary shader code returned in `pData` can be saved by the application and used in a future `vkCreateShadersEXT()` call (including on a different `VkInstance` and/or `VkDevice`) with a compatible physical device by setting `codeType` to `VK_SHADER_CODE_TYPE_BINARY_EXT`. This means that on fixed platforms like game consoles and embedded systems applications need not ship SPIR-V shader code at all. If the binary shader code in any `VkShaderCreateInfoEXT` passed to `vkCreateShadersEXT()` is not compatible with the physical device then the `vkCreateShadersEXT()` call returns `VK_ERROR_INCOMPATIBLE_SHADER_BINARY_EXT`.
147
148Applications must pass the same values of `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` to a `vkCreateShadersEXT()` call with a `codeType` of `VK_SHADER_CODE_TYPE_BINARY_EXT` as were passed when those shaders were originally compiled from SPIR-V.
149
150`VkShaderEXT` objects can be bound on a command buffer using `vkCmdBindShadersEXT()`:
151
152[source,c]
153----
154void vkCmdBindShadersEXT(
155    VkCommandBuffer                             commandBuffer,
156    uint32_t                                    stageCount,
157    const VkShaderStageFlagBits*                pStages,
158    const VkShaderEXT*                          pShaders);
159----
160
161It is possible to unbind shaders for a particular stage by calling `vkCmdBindShadersEXT()` with elements of `pShaders` set to `VK_NULL_HANDLE`. For example, an application may want to arbitrarily bind and unbind a known compatible passthrough geometry shader without knowing or caring what specific vertex and fragment shaders are bound at that time.
162
163Regardless of whether the shaders were created with `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` the interfaces of all stages bound at `vkCmdDraw*()` time must be compatible. This means that the union of descriptor set layouts and push constant ranges across all bound shaders must not conflict, and that the inputs of each stage are compatible with the outputs of the previous stage. It is the application's responsibility to ensure that this is the case, and the implementation will not do any draw time state validation to guard against this kind of invalid usage.
164
165If any of the shaders bound at `vkCmdDraw*()` time were created with `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` then all shaders that were linked to that shader must also be bound. It is the application's responsibility to ensure that this is the case, and the implementation will not do any draw time state validation to guard against this kind of invalid usage.
166
167When drawing with shaders bound with `vkCmdBindShadersEXT()` most state must be set dynamically. Specifically, the following existing commands must be used to set the corresponding state:
168
169 * `vkCmdSetViewportWithCount()`
170 * `vkCmdSetScissorWithCount()`
171 * `vkCmdSetLineWidth()`
172 * `vkCmdSetDepthBias()`
173 * `vkCmdSetBlendConstants()`
174 * `vkCmdSetDepthBounds()`
175 * `vkCmdSetStencilCompareMask()`
176 * `vkCmdSetStencilWriteMask()`
177 * `vkCmdSetStencilReference()`
178 * `vkCmdBindVertexBuffers2()`
179 * `vkCmdSetCullMode()`
180 * `vkCmdSetDepthBoundsTestEnable()`
181 * `vkCmdSetDepthCompareOp()`
182 * `vkCmdSetDepthTestEnable()`
183 * `vkCmdSetDepthWriteEnable()`
184 * `vkCmdSetFrontFace()`
185 * `vkCmdSetPrimitiveTopology()`
186 * `vkCmdSetStencilOp()`
187 * `vkCmdSetStencilTestEnable()`
188 * `vkCmdSetDepthBiasEnable()`
189 * `vkCmdSetPrimitiveRestartEnable()`
190 * `vkCmdSetRasterizerDiscardEnable()`
191 * `vkCmdSetVertexInputEXT()`
192 * `vkCmdSetLogicOpEXT()`
193 * `vkCmdSetPatchControlPointsEXT()`
194 * `vkCmdSetTessellationDomainOriginEXT()`
195 * `vkCmdSetDepthClampEnableEXT()`
196 * `vkCmdSetPolygonModeEXT()`
197 * `vkCmdSetRasterizationSamplesEXT()`
198 * `vkCmdSetSampleMaskEXT()`
199 * `vkCmdSetAlphaToCoverageEnableEXT()`
200 * `vkCmdSetAlphaToOneEnableEXT()`
201 * `vkCmdSetLogicOpEnableEXT()`
202 * `vkCmdSetColorBlendEnableEXT()`
203 * `vkCmdSetColorBlendEquationEXT()`
204 * `vkCmdSetColorWriteMaskEXT()`
205
206If link:{refpage}VK_KHR_fragment_shading_rate.html[VK_KHR_fragment_shading_rate] is supported and enabled:
207
208 * `vkCmdSetFragmentShadingRateKHR()`
209
210If link:{refpage}VK_EXT_transform_feedback.html[VK_EXT_transform_feedback] is supported and enabled:
211
212 * `vkCmdSetRasterizationStreamEXT()`
213
214If link:{refpage}VK_EXT_discard_rectangle.html[VK_EXT_discard_rectangle] is supported and enabled:
215
216 * `vkCmdSetDiscardRectangleEnableEXT()`
217 * `vkCmdSetDiscardRectangleModeEXT()`
218 * `vkCmdSetDiscardRectangleEXT()`
219
220If link:{refpage}VK_EXT_conservative_rasterization.html[VK_EXT_conservative_rasterization] is supported and enabled:
221
222 * `vkCmdSetConservativeRasterizationModeEXT()`
223 * `vkCmdSetExtraPrimitiveOverestimationSizeEXT()`
224
225If link:{refpage}VK_EXT_depth_clip_enable.html[VK_EXT_depth_clip_enable] is supported and enabled:
226
227 * `vkCmdSetDepthClipEnableEXT()`
228
229If link:{refpage}VK_EXT_sample_locations.html[VK_EXT_sample_locations] is supported and enabled:
230
231 * `vkCmdSetSampleLocationsEnableEXT()`
232 * `vkCmdSetSampleLocationsEXT()`
233
234If link:{refpage}VK_EXT_blend_operation_advanced.html[VK_EXT_blend_operation_advanced] is supported and enabled:
235
236 * `vkCmdSetColorBlendAdvancedEXT()`
237
238If link:{refpage}VK_EXT_provoking_vertex.html[VK_EXT_provoking_vertex] is supported and enabled:
239
240 * `vkCmdSetProvokingVertexModeEXT()`
241
242If link:{refpage}VK_EXT_line_rasterization.html[VK_EXT_line_rasterization] is supported and enabled:
243
244 * `vkCmdSetLineRasterizationModeEXT()`
245 * `vkCmdSetLineStippleEnableEXT()`
246 * `vkCmdSetLineStippleEXT()`
247
248If link:{refpage}VK_EXT_depth_clip_control.html[VK_EXT_depth_clip_control] is supported and enabled:
249
250 * `vkCmdSetDepthClipNegativeOneToOneEXT()`
251
252If link:{refpage}VK_EXT_color_write_enable.html[VK_EXT_color_write_enable] is supported and enabled:
253
254 * `vkCmdSetColorWriteEnableEXT()`
255
256If link:{refpage}VK_NV_clip_space_w_scaling.html[VK_NV_clip_space_w_scaling] is supported and enabled:
257
258 * `vkCmdSetViewportWScalingEnableNV()`
259 * `vkCmdSetViewportWScalingNV()`
260
261If link:{refpage}VK_NV_viewport_swizzle.html[VK_NV_viewport_swizzle] is supported and enabled:
262
263 * `vkCmdSetViewportSwizzleNV()`
264
265If link:{refpage}VK_NV_fragment_coverage_to_color.html[VK_NV_fragment_coverage_to_color] is supported and enabled:
266
267 * `vkCmdSetCoverageToColorEnableNV()`
268 * `vkCmdSetCoverageToColorLocationNV()`
269
270If link:{refpage}VK_NV_framebuffer_mixed_samples.html[VK_NV_framebuffer_mixed_samples] is supported and enabled:
271
272 * `vkCmdSetCoverageModulationModeNV()`
273 * `vkCmdSetCoverageModulationTableEnableNV()`
274 * `vkCmdSetCoverageModulationTableNV()`
275
276If link:{refpage}VK_NV_coverage_reduction_mode.html[VK_NV_coverage_reduction_mode] is supported and enabled:
277
278 * `vkCmdSetCoverageReductionModeNV()`
279
280If link:{refpage}VK_NV_representative_fragment_test.html[VK_NV_representative_fragment_test] is supported and enabled:
281
282 * `vkCmdSetRepresentativeFragmentTestEnableNV()`
283
284If link:{refpage}VK_NV_shading_rate_image.html[VK_NV_shading_rate_image] is supported and enabled:
285
286 * `vkCmdSetCoarseSampleOrderNV()`
287 * `vkCmdSetShadingRateImageEnableNV()`
288 * `vkCmdSetViewportShadingRatePaletteNV()`
289
290If link:{refpage}VK_NV_scissor_exclusive.html[VK_NV_scissor_exclusive] is supported and enabled:
291
292 * `vkCmdSetExclusiveScissorEnableNV()`
293 * `vkCmdSetExclusiveScissorNV()`
294
295If link:{refpage}VK_NV_fragment_shading_rate_enums.html[VK_NV_fragment_shading_rate_enums] is supported and enabled:
296
297 * `vkCmdSetFragmentShadingRateEnumNV()`
298
299Certain dynamic state setting commands have modified behavior from their original versions:
300
301 * `vkCmdSetPrimitiveTopology()` does not have any constraints on the topology class (i.e., it behaves as if the `dynamicPrimitiveTopologyUnrestricted` property is `VK_TRUE` even when the actual property is `VK_FALSE`).
302 * `vkCmdSetLogicOpEXT()` may be used on any implementation regardless of its support for the `extendedDynamicState2LogicOp` feature.
303 * `vkCmdSetPatchControlPointsEXT()` may be used on any implementation regardless of its support for the `extendedDynamicState2PatchControlPoints` feature.
304
305Any `VkShaderEXT` can be destroyed using `vkDestroyShaderEXT()`:
306
307[source,c]
308----
309void vkDestroyShaderEXT(
310    VkDevice                                    device,
311    VkShaderEXT                                 shader,
312    VkAllocationCallbacks*                      pAllocator);
313----
314
315Destroying a `VkShaderEXT` object used by action commands in one or more command buffers in the _recording_ or _executable_ states causes those command buffers to enter the _invalid_ state. A `VkShaderEXT` object must not be destroyed as long as any command buffer that issues any action command that uses it is in the _pending_ state.
316
317== Examples
318
319=== Graphics
320
321Consider an application which always treats sets of shader stages as complete programs.
322
323At startup time, the application compiles and links the shaders for each complete program:
324
325[source,c]
326----
327VkShaderCreateInfoEXT shaderInfo[2] = {
328    {
329        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
330        .pNext = NULL,
331        .flags = VK_SHADER_CREATE_LINK_STAGE_BIT_EXT,
332        .stage = VK_SHADER_STAGE_VERTEX_BIT,
333        .nextStage = VK_SHADER_STAGE_FRAGMENT_BIT,
334        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
335        .codeSize = vertexShaderSpirvSize,
336        .pCode = pVertexShaderSpirv,
337        .pName = "main",
338        .setLayoutCount = 1,
339        .pSetLayouts = &descriptorSetLayout,
340        .pushConstantRangeCount = 0,
341        .pPushConstantRanges = NULL,
342        .pSpecializationInfo = NULL
343    },
344    {
345        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
346        .pNext = NULL,
347        .flags = VK_SHADER_CREATE_LINK_STAGE_BIT_EXT,
348        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
349        .nextStage = 0,
350        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
351        .codeSize = fragmentShaderSpirvSize,
352        .pCode = pFragmentShaderSpirv,
353        .pName = "main",
354        .setLayoutCount = 1,
355        .pSetLayouts = &descriptorSetLayout,
356        .pushConstantRangeCount = 0,
357        .pPushConstantRanges = NULL,
358        .pSpecializationInfo = NULL
359    }
360};
361
362VkShaderEXT shaders[2];
363
364vkCreateShadersEXT(device, 2, shaderInfo, NULL, shaders);
365----
366
367Later at draw time, the application binds the linked vertex and fragment shaders forming a complete program:
368
369[source,c]
370----
371VkShaderStageFlagBits stages[2] = {
372    VK_SHADER_STAGE_VERTEX_BIT,
373    VK_SHADER_STAGE_FRAGMENT_BIT
374};
375vkCmdBindShadersEXT(commandBuffer, 2, stages, shaders);
376----
377
378Alternatively, the same result could be achieved by:
379
380[source,c]
381----
382{
383    VkShaderStageFlagBits stage = VK_SHADER_STAGE_VERTEX_BIT;
384    vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shaders[0]);
385}
386
387{
388    VkShaderStageFlagBits stage = VK_SHADER_STAGE_FRAGMENT_BIT;
389    vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shaders[1]);
390}
391----
392
393If the `tessellationShader` or `geometryShader` features are enabled on the device, the application sets the corresponding shader types to VK_NULL_HANDLE:
394
395[source,c]
396----
397VkShaderStageFlagBits unusedStages[3] = {
398    VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT,
399    VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT,
400    VK_SHADER_STAGE_GEOMETRY_BIT
401};
402VkShaderEXT unusedShaders[3] = { /* VK_NULL_HANDLE, ... */ };
403vkCmdBindShadersEXT(commandBuffer, 3, unusedStages, unusedShaders);
404----
405
406Alternatively, the same result could be achieved by:
407
408[source,c]
409----
410VkShaderStageFlagBits unusedStages[3] = {
411    VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT,
412    VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT,
413    VK_SHADER_STAGE_GEOMETRY_BIT
414};
415// Setting pShaders to NULL is equivalent to specifying an array of stageCount VK_NULL_HANDLE values
416vkCmdBindShadersEXT(commandBuffer, 3, unusedStages, NULL);
417----
418
419Finally, the application issues a draw call:
420
421[source,c]
422----
423vkCmdDrawIndexed(commandBuffer, ...);
424----
425
426Now consider a different application which needs to mix and match vertex and fragment shaders in arbitrary combinations that are not predictable at shader compile time.
427
428At startup time, the application compiles unlinked vertex and fragment shaders:
429
430[source,c]
431----
432VkShaderCreateInfoEXT shaderInfo[3] = {
433    {
434        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
435        .pNext = NULL,
436        .flags = 0,
437        .stage = VK_SHADER_STAGE_VERTEX_BIT,
438        .nextStage = VK_SHADER_STAGE_FRAGMENT_BIT,
439        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
440        .codeSize = vertexShaderSpirvSize,
441        .pCode = pVertexShaderSpirv,
442        .pName = "main",
443        .setLayoutCount = 1,
444        .pSetLayouts = &descriptorSetLayout,
445        .pushConstantRangeCount = 0,
446        .pPushConstantRanges = NULL,
447        .pSpecializationInfo = NULL
448    },
449    {
450        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
451        .pNext = NULL,
452        .flags = 0,
453        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
454        .nextStage = 0,
455        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
456        .codeSize = fragmentShaderSpirvSize[0],
457        .pCode = pFragmentShaderSpirv[0],
458        .pName = "main",
459        .setLayoutCount = 1,
460        .pSetLayouts = &descriptorSetLayout,
461        .pushConstantRangeCount = 0,
462        .pPushConstantRanges = NULL,
463        .pSpecializationInfo = NULL
464    },
465    {
466        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
467        .pNext = NULL,
468        .flags = 0,
469        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
470        .nextStage = 0,
471        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
472        .codeSize = fragmentShaderSpirvSize[1],
473        .pCode = pFragmentShaderSpirv[1],
474        .pName = "main",
475        .setLayoutCount = 1,
476        .pSetLayouts = &descriptorSetLayout,
477        .pushConstantRangeCount = 0,
478        .pPushConstantRanges = NULL,
479        .pSpecializationInfo = NULL
480    }
481};
482
483VkShaderEXT shaders[3];
484
485vkCreateShadersEXT(device, 3, shaderInfo, NULL, shaders);
486----
487
488Alternatively, the same result could be achieved by:
489
490[source,c]
491----
492VkShaderEXT shaders[3];
493
494{
495    VkShaderCreateInfoEXT shaderInfo = {
496        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
497        .pNext = NULL,
498        .flags = 0,
499        .stage = VK_SHADER_STAGE_VERTEX_BIT,
500        .nextStage = VK_SHADER_STAGE_FRAGMENT_BIT,
501        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
502        .codeSize = vertexShaderSpirvSize,
503        .pCode = pVertexShaderSpirv,
504        .pName = "main",
505        .setLayoutCount = 1,
506        .pSetLayouts = &descriptorSetLayout,
507        .pushConstantRangeCount = 0,
508        .pPushConstantRanges = NULL,
509        .pSpecializationInfo = NULL
510    };
511
512    vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shaders[0]);
513}
514
515{
516    VkShaderCreateInfoEXT shaderInfo = {
517        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
518        .pNext = NULL,
519        .flags = 0,
520        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
521        .nextStage = 0,
522        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
523        .codeSize = fragmentShaderSpirvSize[0],
524        .pCode = pFragmentShaderSpirv[0],
525        .pName = "main",
526        .setLayoutCount = 1,
527        .pSetLayouts = &descriptorSetLayout,
528        .pushConstantRangeCount = 0,
529        .pPushConstantRanges = NULL,
530        .pSpecializationInfo = NULL
531    };
532
533    vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shaders[1]);
534}
535
536{
537    VkShaderCreateInfoEXT shaderInfo = {
538        .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
539        .pNext = NULL,
540        .flags = 0,
541        .stage = VK_SHADER_STAGE_FRAGMENT_BIT,
542        .nextStage = 0,
543        .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
544        .codeSize = fragmentShaderSpirvSize[1],
545        .pCode = pFragmentShaderSpirv[1],
546        .pName = "main",
547        .setLayoutCount = 1,
548        .pSetLayouts = &descriptorSetLayout,
549        .pushConstantRangeCount = 0,
550        .pPushConstantRanges = NULL,
551        .pSpecializationInfo = NULL
552    };
553
554    vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shaders[2]);
555}
556----
557
558Later at draw time, the application binds independent vertex and fragment shaders forming a complete program:
559
560[source,c]
561----
562VkShaderStageFlagBits stages[2] = {
563    VK_SHADER_STAGE_VERTEX_BIT,
564    VK_SHADER_STAGE_FRAGMENT_BIT
565};
566vkCmdBindShadersEXT(commandBuffer, 2, stages, shaders);
567----
568
569If the `tessellationShader` or `geometryShader` features are enabled on the device, the application sets the corresponding shader types to VK_NULL_HANDLE:
570
571[source,c]
572----
573VkShaderStageFlagBits unusedStages[3] = {
574    VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT,
575    VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT,
576    VK_SHADER_STAGE_GEOMETRY_BIT
577};
578// Setting pShaders to NULL is equivalent to specifying an array of stageCount VK_NULL_HANDLE values
579vkCmdBindShadersEXT(commandBuffer, 3, unusedStages, NULL);
580----
581
582Then, the application issues a draw call:
583
584[source,c]
585----
586vkCmdDrawIndexed(commandBuffer, ...);
587----
588
589Later, the application binds a different fragment shader without disturbing any other stages:
590
591[source,c]
592----
593VkShaderStageFlagBits stage = VK_SHADER_STAGE_FRAGMENT_BIT;
594vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shaders[2]);
595----
596
597Finally, the application issues another draw call:
598
599[source,c]
600----
601vkCmdDrawIndexed(commandBuffer, ...);
602----
603
604=== Compute
605
606At startup time, the application compiles a compute shader:
607
608[source,c]
609----
610VkShaderCreateInfoEXT shaderInfo = {
611    .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT,
612    .pNext = NULL,
613    .flags = 0,
614    .stage = VK_SHADER_STAGE_COMPUTE_BIT,
615    .nextStage = 0,
616    .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT,
617    .codeSize = computeShaderSpirvSize,
618    .pCode = pComputeShaderSpirv,
619    .pName = "main",
620    .setLayoutCount = 1,
621    .pSetLayouts = &descriptorSetLayout,
622    .pushConstantRangeCount = 0,
623    .pPushConstantRanges = NULL,
624    .pSpecializationInfo = NULL
625};
626
627VkShaderEXT shader;
628
629vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shader);
630----
631
632Later, the application binds the compute shader:
633
634[source,c]
635----
636VkShaderStageFlagBits stage = VK_SHADER_STAGE_COMPUTE_BIT;
637vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shader);
638----
639
640Finally, the application dispatches the compute:
641
642[source,c]
643----
644vkCmdDispatch(commandBuffer, ...);
645----
646
647== Issues
648
649=== RESOLVED: How should implementations which absolutely must link shader stages implement this extension?
650
651The purpose of this extension is to expose the flexibility of those implementations which allow arbitrary combinations of unlinked but compatible shader stages and state to be bound independently. Attempting to modify this extension to support implementations which do not have this flexibility would defeat the entire purpose of the extension. For this reason, implementations which do not have the required flexibility should not implement this extension.
652
653IHVs whose implementations have such limitations today are encouraged to consider incorporating changes which could remove these limitations into their future hardware roadmaps.
654
655=== RESOLVED: Should this extension try to reuse pipeline objects and concepts?
656
657No - the pipeline abstraction was never designed with such a radically different design in mind.
658
659Avoiding the introduction of a new object type and a handful of new entry points is not a compelling reason to continue to pile less and less pipeline-like functionality into pipelines. Doing so would needlessly constrict or even undermine the design and future extensibility of both models.
660
661=== RESOLVED: Should binary shader support be exposed in some way similar to existing pipeline caches or pipeline binaries?
662
663No - fixed platforms like game consoles and embedded systems have constraints which make shipping both SPIR-V and binary copies of the same shader code undesirable.
664
665=== RESOLVED: Should there be some kind of shader program object to represent a set of linked shaders?
666
667No - the compiled code for each shader stage is represented by a single `VkShaderEXT` object whether it is linked to other stages or not.
668
669Introducing a shader program object would overly complicate the API and impose a new and unnecessary object lifetime management burden on applications. Vulkan is a low level API, and it should be the application's responsibility to ensure that it keeps any promises it chooses to make about binding the correct stages together.
670
671[NOTE]
672====
673Whenever shaders are created linked together, the rules for binding them give implementations the freedom to (for example) internally store the compiled code for multiple linked stages in a single stage's `VkShaderEXT` object and to leave the other stages' `VkShaderEXT` objects internally unused, though this is *strongly* discouraged.
674====
675
676=== RESOLVED: Should there be some mechanism for applications to provide static state that is known at compile time?
677
678Not as part of this extension - it is possible to imagine some kind of "shader optimization hint" functionality to let applications provide implementations with "static state" similar to the existing static state in pipelines, but on an opt-in rather than opt-out basis. By providing a given piece of state in an optimization hint at shader creation time, an application could promise that the equivalent piece of dynamic state would always be set to some specific value whenever that shader is used, thereby allowing implementations to perform compile time optimizations similar to those they can make with pipelines today.
679
680For already pipeline-friendly applications with lots of static state this could serve as a "gentler" version of pipelines that might provide the best of both worlds, but it is unclear that the benefits of such a scheme for the (pipeline-unfriendly) majority of applications which actually need this extension would outweigh the costs of the added complexity to the API.
681
682If such functionality turns out to be important, it can be noninvasively layered on top of this extension in the form of another extension. Until then, applications wanting something that behaves like pipelines should just use pipelines.
683
684=== RESOLVED: Should this extension expose some abstraction for setting groups of related state?
685
686No - an earlier version of this proposal exposed a mechanism for applications to pre-create "interface shaders" which could then be bound on a command buffer to reduce draw time overhead. This added complexity to the API, and it was unclear that this solution would be able to deliver meaningful performance improvements over setting individual pieces of state on the command buffer.
687
688Such an abstraction may prove beneficial for certain implementations, but it should not be designed until those implementations have at least attempted to implement support for this extension in its existing form.
689
690=== RESOLVED: There is currently no dynamic state setting functionality for sample shading. How should this be handled?
691
692Sample shading is already implicitly enabled (with minSampleShading = 1.0) whenever a shader reads from the SampleId or SamplePosition builtins. The main functionality missing in the absence of dynamic sample shading is the ability to specify minSampleShading values other than 1.0.
693
694This could be addressed by introducing a new MinSampleShading shader builtin which can be either hard-coded or specialized at SPIR-V compile time using the existing specialization constant mechanism. However, since introducing this functionality is orthogonal to the objective of this extension this is left up to a different extension.
695
696Until such an extension is available, applications that need to specify a minSampleShading other than 1.0 should use pipelines.
697
698== Further Functionality
699
700 * Shader optimization hints
701 * State grouping
702 * Ray tracing shader objects
703