1// Copyright 2023 The Khronos Group, Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5= VK_EXT_shader_object 6:toc: left 7:refpage: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/ 8:sectnums: 9 10This document describes the proposed design for a new extension which aims to comprehensively address problems the pipeline abstraction has created for both applications and implementations. 11 12== Problem Statement 13 14When Vulkan 1.0 and its precursor Mantle were originally developed the then-existing shader and state binding models of earlier APIs were beginning to show worrying limitations, both in terms of draw call scaling and driver complexity needed to support them. Application developers were being artificially constrained from accessing the full capabilities of GPUs, and many IHVs were forced to maintain rat's nests of driver code full of heavy-handed draw time state validation and hacky shader patching, all in the service of simplicity at the API level. IHVs were understandably highly motivated to move away from such API designs. 15 16Enter the new low-level APIs like Mantle and ultimately Vulkan. These APIs set out to reduce driver overhead by exposing lower-level abstractions that would hopefully avoid the need for the draw time state validation and shader patching that was so problematic for IHVs, and so detrimental to performance for applications. 17 18One of the most significant changes to this end was the new concept of pipelines, which promised to shift the burden of the shader state combinatorics out of drivers and into applications, ideally avoiding the need for driver-side draw time state validation and shader patching entirely. The thinking went that application developers would design or redesign their renderers with pipelines in mind, and in so doing they would naturally learn to accomplish their goals with fewer combinations of state. 19 20Implicit in such a design was an assumption that applications would be able to know and provide nearly all of this state upfront. A very limited set of dynamic states was specified for the few pieces of state that had effectively unbounded ranges of values, but otherwise even state that could have been fully dynamic on all implementations was required to be baked into the static pipeline objects. This, the thinking went, would benefit even those implementations where the state was internally dynamic by enabling new possibilities for optimization during shader compilation. 21 22Also implicit in the design of pipelines was an assumption that the driver overhead of the pipeline abstraction would either be negligible, or that it would at least always be outweighed by the performance savings at draw time when compared to earlier APIs. The possibility that either setting dozens of individual pieces of state each time a pipeline is bound or tracking which of those dozens of pieces of state had changed since the previous pipeline bind might cause some implementations to exhibit problematically high overhead at pipeline bind time does not seem to have been a central consideration. 23 24Many of these assumptions have since proven to be unrealistic. 25 26On the application side, many developers considering or implementing Vulkan and similar APIs found them unable to efficiently support important use cases which were easily supportable in earlier APIs. This has not been simply a matter of developers being stuck in an old way of thinking or unwilling to "rein in" an unnecessarily large number of state combinations, but a reflection of the reality that the natural design patterns of the most demanding class of applications which use graphics APIs -- video games -- are inherently and deeply dependent on the very "dynamism" that pipelines set out to constrain. 27 28As a result, renderers with a choice of API have largely chosen to avoid Vulkan and its "pipelined" contemporaries, while those without a choice have largely just devised workarounds to make these new APIs behave like the old ones -- usually in the form of the now nearly ubiquitous hash-n-cache pattern. These applications set various pieces of "pipeline" state independently, then hash it all at draw time and use the hash as a key into an application-managed pipeline cache, reusing an existing pipeline if it exists or creating and caching a new one if it does not. In effect, the messy and inefficient parts of GL drivers that pipelines sought to eliminate have simply moved into applications, except without the benefits of implementation specific knowledge which might have reduced their complexity or improved their performance. 29 30This is not just a problem of "legacy" application code where it might be viable for the API to wait it out until application codebases are rewritten or replaced. Applications need the features they need, and are unlikely to remove features they need just to satisfy what they know to be artificial limitations imposed by a graphics API's made-up abstraction. This is especially true for developers working on platforms where the pipeline API does not offer substantial performance benefits over other APIs that do not share the same limitations. 31 32On the driver side, pipelines have provided some of their desired benefits for some implementations, but for others they have largely just shifted draw time overhead to pipeline bind time (while in some cases still not entirely eliminating the draw time overhead in the first place). Implementations where nearly all "pipeline" state is internally dynamic are forced to either redundantly re-bind all of this state each time a pipeline is bound, or to track what state has changed from pipeline to pipeline -- either of which creates considerable overhead on CPU-constrained platforms. 33 34For certain implementations, the pipeline abstraction has also locked away a significant amount of the flexibility supported by their hardware, thereby paradoxically leaving many of their capabilities inaccessible in the newer and ostensibly "low level" API, though still accessible through older, high level ones. In effect, this is a return to the old problem of the graphics API artificially constraining applications from accessing the full capabilities of the GPU, only on a different axis. 35 36Finally, on fixed hardware platforms like game consoles and embedded systems pipelines have created some additional and unique challenges. These platforms tend to have limited CPU performance, memory, and storage capacity all at the same time. Because of this it is generally not desirable for applications on these platforms to waste storage space shipping both uncompiled SPIR-V and precompiled pipeline caches, however it is also not desirable to compile the same shaders from scratch on each system (even if they could be cached for subsequent runs). Also, the hardware and even driver versions on these systems are typically known in advance, and drivers might only ever change in tandem with applications. Vulkan applications on these systems are forced to waste precious storage space on not only shipping both SPIR-V and pipeline cached versions of their shaders, but on their pipeline caches containing potentially large numbers of slightly differently optimized permutations of the same shader code, with only minor differences in pipeline state (arguably this last point is a compression problem, but opaque pipeline caches mostly leave applications at the mercy of the driver to solve it for them). 37 38Fortunately, some of these problems have been acknowledged and various efforts have already begun to address several of them. 39 40These existing efforts have mainly chosen to tackle problems through the lens of existing hash-n-cache type application architectures, and have focused on those problems which are most acute at pipeline compile time. Their goals have included things like reducing pipeline counts, improving the usability and efficiency of pipeline caches, and introducing more granularity to the pipeline compilation and caching process. The extensions they have produced have preferred a targeted, piecemeal, and minimally invasive "band-aid" approach over a more holistic "rip off the band-aid" redesign. 41 42Such efforts have undoubtedly produced valuable improvements, but they have left the class of problems which manifest at bind time largely unaddressed. It might be possible to continue the existing piecemeal approach with a refocus onto bind time, but the solution space afforded by this kind of approach would necessarily remain constrained by the design decisions of the past. 43 44== Solution Space 45 46Several approaches are immediately apparent: 47 48 . Extend the existing graphics pipeline library concept somehow, perhaps by adding optional new, more granular library types and/or making pipeline binaries directly bindable without needing to be explicitly linked into a pipeline object 49 . Continue to expose more (maybe optional) dynamic state to minimize the number of pipeline objects needed 50 . Abandon pipelines entirely and introduce new functionality to compile and bind shaders directly 51 52Option 1 is a natural extension of recent efforts and requires relatively few API changes, but it adds even more complexity to the already very complex pipeline concept, while also failing to adequately address significant parts of the problem. While directly bindable pipeline libraries do reduce the dimensionality of pipeline combinatorics, they do not provide any meaningful absolute CPU performance improvement at pipeline bind time. The total overhead of binding N different pipeline libraries is still roughly on par with the overhead of binding a single (monolithic or linked) pipeline. 53 54Option 2 also requires relatively few API changes and would do more to address bind time CPU performance than option 1, but this option is limited in both the class of issues it can address and its portability across implementations. Much of the universally supportable "low hanging fruit" dynamic state has already been exposed by the existing extended dynamic state extensions, and the remaining state is mostly not universally dynamic. Exposing states A and B as dynamic on one implementation and states B and C on another is still valuable, but it limits this approach's benefits for simplifying application architectures. Even though this option is not a complete solution, it can and should be pursued in parallel with other efforts -- both for its own sake and as a potential foundation for more a comprehensive solution. 55 56Option 3 is more radical, but brings the API design more in line with developer expectations. The pipeline abstraction has been a consistent problem for many developers trying to use Vulkan since its inception, and this option can produce a cleaner, more user-friendly abstraction that bypasses the complexity of pipelines. With the benefit of years of hindsight and broader Working Group knowledge about the constraints of each others' implementations, it can aim to achieve a design which better balances API simplicity with adherence to the explicit design ethos of Vulkan. 57 58This proposal focuses on option 3, for the reasons outlined above. 59 60== Proposal 61 62=== Shaders 63 64This extension introduces a new object type `VkShaderEXT` which represents a single compiled shader stage. `VkShaderEXT` objects may be created either independently or linked with other `VkShaderEXT` objects created at the same time. To create `VkShaderEXT` objects, applications call `vkCreateShadersEXT()`: 65 66[source,c] 67---- 68VkResult vkCreateShadersEXT( 69 VkDevice device, 70 uint32_t createInfoCount, 71 VkShaderCreateInfoEXT* pCreateInfos, 72 VkAllocationCallbacks* pAllocator, 73 VkShaderEXT* pShaders); 74---- 75 76This function compiles the source code for one or more shader stages into `VkShaderEXT` objects. Whenever `createInfoCount` is greater than one, the shaders being created may optionally be linked together. Linking allows the implementation to perform cross-stage optimizations based on a promise by the application that the linked shaders will always be used together. 77 78Though a set of linked shaders may perform anywhere between the same to substantially better than equivalent unlinked shaders, this tradeoff is left to the application and linking is never mandatory. 79 80[source,c] 81---- 82typedef enum VkShaderCreateFlagBitsEXT { 83 VK_SHADER_CREATE_LINK_STAGE_BIT_EXT = 0x00000001, 84 VK_SHADER_CREATE_ALLOW_VARYING_SUBGROUP_SIZE_BIT_EXT = 0x00000002, 85 VK_SHADER_CREATE_REQUIRE_FULL_SUBGROUPS_BIT_EXT = 0x00000004, 86 VK_SHADER_CREATE_NO_TASK_SHADER_BIT_EXT = 0x00000008, 87 VK_SHADER_CREATE_DISPATCH_BASE_BIT_EXT = 0x00000010, 88 VK_SHADER_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_EXT = 0x00000020, 89 VK_SHADER_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT = 0x00000040 90} VkShaderCreateFlagBitsEXT; 91typedef VkFlags VkShaderCreateFlagsEXT; 92 93typedef enum VkShaderCodeTypeEXT { 94 VK_SHADER_CODE_TYPE_BINARY_EXT = 0, 95 VK_SHADER_CODE_TYPE_SPIRV_EXT = 1 96} VkShaderCodeTypeEXT; 97 98typedef struct VkShaderCreateInfoEXT { 99 VkStructureType sType; 100 const void* pNext; 101 VkShaderCreateFlagsEXT flags; 102 VkShaderStageFlagBits stage; 103 VkShaderStageFlags nextStage; 104 VkShaderCodeTypeEXT codeType; 105 size_t codeSize; 106 const void* pCode; 107 const char* pName; 108 uint32_t setLayoutCount; 109 const VkDescriptorSetLayout* pSetLayouts; 110 uint32_t pushConstantRangeCount; 111 const VkPushConstantRange* pPushConstantRanges; 112 const VkSpecializationInfo* pSpecializationInfo; 113} VkShaderCreateInfoEXT; 114---- 115 116To specify that shaders should be linked, include the `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` flag in each of the `VkShaderCreateInfoEXT` structures passed to `vkCreateShadersEXT()`. The presence or absence of `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` must match across all `VkShaderCreateInfoEXT` structures passed to a single `vkCreateShadersEXT()` call: i.e., if any member of `pCreateInfos` includes `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` then all other members must include it too. `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` is ignored if `createInfoCount` is one, and a shader created this way is considered unlinked. 117 118The stage of the shader being compiled is specified by `stage`. Applications must also specify which stage types will be allowed to immediately follow the shader being created. For example, a vertex shader might specify a `nextStage` value of `VK_SHADER_STAGE_FRAGMENT_BIT` to indicate that the vertex shader being created will always be followed by a fragment shader (and never a geometry or tessellation shader). Applications that do not know this information at shader creation time or need the same shader to be compatible with multiple subsequent stages can specify a mask that includes as many valid next stages as they wish. For example, a vertex shader can specify a `nextStage` mask of `VK_SHADER_STAGE_GEOMETRY_BIT | VK_SHADER_STAGE_FRAGMENT_BIT` to indicate that the next stage could be either a geometry shader or fragment shader (but not a tessellation shader). 119 120[NOTE] 121==== 122Certain implementations may incur a compile time and/or memory usage penalty whenever more than one stage bit is set in `nextStage`, so applications should strive to set the minimum number of bits they are able to. However, applications should *not* interpret this advice to mean that they should create multiple `VkShaderEXT` objects that differ only by the value of `nextStage`, as this will incur unnecessarily overhead on implementations where `nextStage` is ignored. 123==== 124 125The shader code is pointed to by `pCode` and may be provided as SPIR-V, or in an opaque implementation defined binary form specific to the physical device. The format of the shader code is specified by `codeType`. 126 127The `codeType` of all `VkShaderCreateInfoEXT` structures passed to a `vkCreateShadersEXT()` call must match. This also means that only shaders created with the same `codeType` may be linked together. 128 129Descriptor set layouts and push constant ranges used by each shader are specified directly (not via a `VkPipelineLayout`), though multiple stages can of course point to the same structures. 130 131Any time after a `VkShaderEXT` object has been created, its binary shader code can be queried using `vkGetShaderBinaryDataEXT()`: 132 133[source,c] 134---- 135VkResult vkGetShaderBinaryDataEXT( 136 VkDevice device, 137 VkShaderEXT shader, 138 size_t* pDataSize, 139 void* pData); 140---- 141 142When `pData` is `NULL`, `size` is filled with the number of bytes needed to store the shader’s binary code and `VK_SUCCESS` is returned. 143 144When `pData` is non-`NULL`, `size` points to the application-provided size of `pData`. If the provided size is large enough then the location pointed to by `pData` is filled with the shader’s binary code and `VK_SUCCESS` is returned, otherwise nothing is written to `pData` and `VK_INCOMPLETE` is returned. 145 146The binary shader code returned in `pData` can be saved by the application and used in a future `vkCreateShadersEXT()` call (including on a different `VkInstance` and/or `VkDevice`) with a compatible physical device by setting `codeType` to `VK_SHADER_CODE_TYPE_BINARY_EXT`. This means that on fixed platforms like game consoles and embedded systems applications need not ship SPIR-V shader code at all. If the binary shader code in any `VkShaderCreateInfoEXT` passed to `vkCreateShadersEXT()` is not compatible with the physical device then the `vkCreateShadersEXT()` call returns `VK_ERROR_INCOMPATIBLE_SHADER_BINARY_EXT`. 147 148Applications must pass the same values of `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` to a `vkCreateShadersEXT()` call with a `codeType` of `VK_SHADER_CODE_TYPE_BINARY_EXT` as were passed when those shaders were originally compiled from SPIR-V. 149 150`VkShaderEXT` objects can be bound on a command buffer using `vkCmdBindShadersEXT()`: 151 152[source,c] 153---- 154void vkCmdBindShadersEXT( 155 VkCommandBuffer commandBuffer, 156 uint32_t stageCount, 157 const VkShaderStageFlagBits* pStages, 158 const VkShaderEXT* pShaders); 159---- 160 161It is possible to unbind shaders for a particular stage by calling `vkCmdBindShadersEXT()` with elements of `pShaders` set to `VK_NULL_HANDLE`. For example, an application may want to arbitrarily bind and unbind a known compatible passthrough geometry shader without knowing or caring what specific vertex and fragment shaders are bound at that time. 162 163Regardless of whether the shaders were created with `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` the interfaces of all stages bound at `vkCmdDraw*()` time must be compatible. This means that the union of descriptor set layouts and push constant ranges across all bound shaders must not conflict, and that the inputs of each stage are compatible with the outputs of the previous stage. It is the application's responsibility to ensure that this is the case, and the implementation will not do any draw time state validation to guard against this kind of invalid usage. 164 165If any of the shaders bound at `vkCmdDraw*()` time were created with `VK_SHADER_CREATE_LINK_STAGE_BIT_EXT` then all shaders that were linked to that shader must also be bound. It is the application's responsibility to ensure that this is the case, and the implementation will not do any draw time state validation to guard against this kind of invalid usage. 166 167When drawing with shaders bound with `vkCmdBindShadersEXT()` most state must be set dynamically. Specifically, the following existing commands must be used to set the corresponding state: 168 169 * `vkCmdSetViewportWithCount()` 170 * `vkCmdSetScissorWithCount()` 171 * `vkCmdSetLineWidth()` 172 * `vkCmdSetDepthBias()` 173 * `vkCmdSetBlendConstants()` 174 * `vkCmdSetDepthBounds()` 175 * `vkCmdSetStencilCompareMask()` 176 * `vkCmdSetStencilWriteMask()` 177 * `vkCmdSetStencilReference()` 178 * `vkCmdBindVertexBuffers2()` 179 * `vkCmdSetCullMode()` 180 * `vkCmdSetDepthBoundsTestEnable()` 181 * `vkCmdSetDepthCompareOp()` 182 * `vkCmdSetDepthTestEnable()` 183 * `vkCmdSetDepthWriteEnable()` 184 * `vkCmdSetFrontFace()` 185 * `vkCmdSetPrimitiveTopology()` 186 * `vkCmdSetStencilOp()` 187 * `vkCmdSetStencilTestEnable()` 188 * `vkCmdSetDepthBiasEnable()` 189 * `vkCmdSetPrimitiveRestartEnable()` 190 * `vkCmdSetRasterizerDiscardEnable()` 191 * `vkCmdSetVertexInputEXT()` 192 * `vkCmdSetLogicOpEXT()` 193 * `vkCmdSetPatchControlPointsEXT()` 194 * `vkCmdSetTessellationDomainOriginEXT()` 195 * `vkCmdSetDepthClampEnableEXT()` 196 * `vkCmdSetPolygonModeEXT()` 197 * `vkCmdSetRasterizationSamplesEXT()` 198 * `vkCmdSetSampleMaskEXT()` 199 * `vkCmdSetAlphaToCoverageEnableEXT()` 200 * `vkCmdSetAlphaToOneEnableEXT()` 201 * `vkCmdSetLogicOpEnableEXT()` 202 * `vkCmdSetColorBlendEnableEXT()` 203 * `vkCmdSetColorBlendEquationEXT()` 204 * `vkCmdSetColorWriteMaskEXT()` 205 206If link:{refpage}VK_KHR_fragment_shading_rate.html[VK_KHR_fragment_shading_rate] is supported and enabled: 207 208 * `vkCmdSetFragmentShadingRateKHR()` 209 210If link:{refpage}VK_EXT_transform_feedback.html[VK_EXT_transform_feedback] is supported and enabled: 211 212 * `vkCmdSetRasterizationStreamEXT()` 213 214If link:{refpage}VK_EXT_discard_rectangle.html[VK_EXT_discard_rectangle] is supported and enabled: 215 216 * `vkCmdSetDiscardRectangleEnableEXT()` 217 * `vkCmdSetDiscardRectangleModeEXT()` 218 * `vkCmdSetDiscardRectangleEXT()` 219 220If link:{refpage}VK_EXT_conservative_rasterization.html[VK_EXT_conservative_rasterization] is supported and enabled: 221 222 * `vkCmdSetConservativeRasterizationModeEXT()` 223 * `vkCmdSetExtraPrimitiveOverestimationSizeEXT()` 224 225If link:{refpage}VK_EXT_depth_clip_enable.html[VK_EXT_depth_clip_enable] is supported and enabled: 226 227 * `vkCmdSetDepthClipEnableEXT()` 228 229If link:{refpage}VK_EXT_sample_locations.html[VK_EXT_sample_locations] is supported and enabled: 230 231 * `vkCmdSetSampleLocationsEnableEXT()` 232 * `vkCmdSetSampleLocationsEXT()` 233 234If link:{refpage}VK_EXT_blend_operation_advanced.html[VK_EXT_blend_operation_advanced] is supported and enabled: 235 236 * `vkCmdSetColorBlendAdvancedEXT()` 237 238If link:{refpage}VK_EXT_provoking_vertex.html[VK_EXT_provoking_vertex] is supported and enabled: 239 240 * `vkCmdSetProvokingVertexModeEXT()` 241 242If link:{refpage}VK_EXT_line_rasterization.html[VK_EXT_line_rasterization] is supported and enabled: 243 244 * `vkCmdSetLineRasterizationModeEXT()` 245 * `vkCmdSetLineStippleEnableEXT()` 246 * `vkCmdSetLineStippleEXT()` 247 248If link:{refpage}VK_EXT_depth_clip_control.html[VK_EXT_depth_clip_control] is supported and enabled: 249 250 * `vkCmdSetDepthClipNegativeOneToOneEXT()` 251 252If link:{refpage}VK_EXT_color_write_enable.html[VK_EXT_color_write_enable] is supported and enabled: 253 254 * `vkCmdSetColorWriteEnableEXT()` 255 256If link:{refpage}VK_NV_clip_space_w_scaling.html[VK_NV_clip_space_w_scaling] is supported and enabled: 257 258 * `vkCmdSetViewportWScalingEnableNV()` 259 * `vkCmdSetViewportWScalingNV()` 260 261If link:{refpage}VK_NV_viewport_swizzle.html[VK_NV_viewport_swizzle] is supported and enabled: 262 263 * `vkCmdSetViewportSwizzleNV()` 264 265If link:{refpage}VK_NV_fragment_coverage_to_color.html[VK_NV_fragment_coverage_to_color] is supported and enabled: 266 267 * `vkCmdSetCoverageToColorEnableNV()` 268 * `vkCmdSetCoverageToColorLocationNV()` 269 270If link:{refpage}VK_NV_framebuffer_mixed_samples.html[VK_NV_framebuffer_mixed_samples] is supported and enabled: 271 272 * `vkCmdSetCoverageModulationModeNV()` 273 * `vkCmdSetCoverageModulationTableEnableNV()` 274 * `vkCmdSetCoverageModulationTableNV()` 275 276If link:{refpage}VK_NV_coverage_reduction_mode.html[VK_NV_coverage_reduction_mode] is supported and enabled: 277 278 * `vkCmdSetCoverageReductionModeNV()` 279 280If link:{refpage}VK_NV_representative_fragment_test.html[VK_NV_representative_fragment_test] is supported and enabled: 281 282 * `vkCmdSetRepresentativeFragmentTestEnableNV()` 283 284If link:{refpage}VK_NV_shading_rate_image.html[VK_NV_shading_rate_image] is supported and enabled: 285 286 * `vkCmdSetCoarseSampleOrderNV()` 287 * `vkCmdSetShadingRateImageEnableNV()` 288 * `vkCmdSetViewportShadingRatePaletteNV()` 289 290If link:{refpage}VK_NV_scissor_exclusive.html[VK_NV_scissor_exclusive] is supported and enabled: 291 292 * `vkCmdSetExclusiveScissorEnableNV()` 293 * `vkCmdSetExclusiveScissorNV()` 294 295If link:{refpage}VK_NV_fragment_shading_rate_enums.html[VK_NV_fragment_shading_rate_enums] is supported and enabled: 296 297 * `vkCmdSetFragmentShadingRateEnumNV()` 298 299Certain dynamic state setting commands have modified behavior from their original versions: 300 301 * `vkCmdSetPrimitiveTopology()` does not have any constraints on the topology class (i.e., it behaves as if the `dynamicPrimitiveTopologyUnrestricted` property is `VK_TRUE` even when the actual property is `VK_FALSE`). 302 * `vkCmdSetLogicOpEXT()` may be used on any implementation regardless of its support for the `extendedDynamicState2LogicOp` feature. 303 * `vkCmdSetPatchControlPointsEXT()` may be used on any implementation regardless of its support for the `extendedDynamicState2PatchControlPoints` feature. 304 305Any `VkShaderEXT` can be destroyed using `vkDestroyShaderEXT()`: 306 307[source,c] 308---- 309void vkDestroyShaderEXT( 310 VkDevice device, 311 VkShaderEXT shader, 312 VkAllocationCallbacks* pAllocator); 313---- 314 315Destroying a `VkShaderEXT` object used by action commands in one or more command buffers in the _recording_ or _executable_ states causes those command buffers to enter the _invalid_ state. A `VkShaderEXT` object must not be destroyed as long as any command buffer that issues any action command that uses it is in the _pending_ state. 316 317== Examples 318 319=== Graphics 320 321Consider an application which always treats sets of shader stages as complete programs. 322 323At startup time, the application compiles and links the shaders for each complete program: 324 325[source,c] 326---- 327VkShaderCreateInfoEXT shaderInfo[2] = { 328 { 329 .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT, 330 .pNext = NULL, 331 .flags = VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, 332 .stage = VK_SHADER_STAGE_VERTEX_BIT, 333 .nextStage = VK_SHADER_STAGE_FRAGMENT_BIT, 334 .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT, 335 .codeSize = vertexShaderSpirvSize, 336 .pCode = pVertexShaderSpirv, 337 .pName = "main", 338 .setLayoutCount = 1, 339 .pSetLayouts = &descriptorSetLayout, 340 .pushConstantRangeCount = 0, 341 .pPushConstantRanges = NULL, 342 .pSpecializationInfo = NULL 343 }, 344 { 345 .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT, 346 .pNext = NULL, 347 .flags = VK_SHADER_CREATE_LINK_STAGE_BIT_EXT, 348 .stage = VK_SHADER_STAGE_FRAGMENT_BIT, 349 .nextStage = 0, 350 .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT, 351 .codeSize = fragmentShaderSpirvSize, 352 .pCode = pFragmentShaderSpirv, 353 .pName = "main", 354 .setLayoutCount = 1, 355 .pSetLayouts = &descriptorSetLayout, 356 .pushConstantRangeCount = 0, 357 .pPushConstantRanges = NULL, 358 .pSpecializationInfo = NULL 359 } 360}; 361 362VkShaderEXT shaders[2]; 363 364vkCreateShadersEXT(device, 2, shaderInfo, NULL, shaders); 365---- 366 367Later at draw time, the application binds the linked vertex and fragment shaders forming a complete program: 368 369[source,c] 370---- 371VkShaderStageFlagBits stages[2] = { 372 VK_SHADER_STAGE_VERTEX_BIT, 373 VK_SHADER_STAGE_FRAGMENT_BIT 374}; 375vkCmdBindShadersEXT(commandBuffer, 2, stages, shaders); 376---- 377 378Alternatively, the same result could be achieved by: 379 380[source,c] 381---- 382{ 383 VkShaderStageFlagBits stage = VK_SHADER_STAGE_VERTEX_BIT; 384 vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shaders[0]); 385} 386 387{ 388 VkShaderStageFlagBits stage = VK_SHADER_STAGE_FRAGMENT_BIT; 389 vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shaders[1]); 390} 391---- 392 393If the `tessellationShader` or `geometryShader` features are enabled on the device, the application sets the corresponding shader types to VK_NULL_HANDLE: 394 395[source,c] 396---- 397VkShaderStageFlagBits unusedStages[3] = { 398 VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT, 399 VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, 400 VK_SHADER_STAGE_GEOMETRY_BIT 401}; 402VkShaderEXT unusedShaders[3] = { /* VK_NULL_HANDLE, ... */ }; 403vkCmdBindShadersEXT(commandBuffer, 3, unusedStages, unusedShaders); 404---- 405 406Alternatively, the same result could be achieved by: 407 408[source,c] 409---- 410VkShaderStageFlagBits unusedStages[3] = { 411 VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT, 412 VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, 413 VK_SHADER_STAGE_GEOMETRY_BIT 414}; 415// Setting pShaders to NULL is equivalent to specifying an array of stageCount VK_NULL_HANDLE values 416vkCmdBindShadersEXT(commandBuffer, 3, unusedStages, NULL); 417---- 418 419Finally, the application issues a draw call: 420 421[source,c] 422---- 423vkCmdDrawIndexed(commandBuffer, ...); 424---- 425 426Now consider a different application which needs to mix and match vertex and fragment shaders in arbitrary combinations that are not predictable at shader compile time. 427 428At startup time, the application compiles unlinked vertex and fragment shaders: 429 430[source,c] 431---- 432VkShaderCreateInfoEXT shaderInfo[3] = { 433 { 434 .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT, 435 .pNext = NULL, 436 .flags = 0, 437 .stage = VK_SHADER_STAGE_VERTEX_BIT, 438 .nextStage = VK_SHADER_STAGE_FRAGMENT_BIT, 439 .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT, 440 .codeSize = vertexShaderSpirvSize, 441 .pCode = pVertexShaderSpirv, 442 .pName = "main", 443 .setLayoutCount = 1, 444 .pSetLayouts = &descriptorSetLayout, 445 .pushConstantRangeCount = 0, 446 .pPushConstantRanges = NULL, 447 .pSpecializationInfo = NULL 448 }, 449 { 450 .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT, 451 .pNext = NULL, 452 .flags = 0, 453 .stage = VK_SHADER_STAGE_FRAGMENT_BIT, 454 .nextStage = 0, 455 .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT, 456 .codeSize = fragmentShaderSpirvSize[0], 457 .pCode = pFragmentShaderSpirv[0], 458 .pName = "main", 459 .setLayoutCount = 1, 460 .pSetLayouts = &descriptorSetLayout, 461 .pushConstantRangeCount = 0, 462 .pPushConstantRanges = NULL, 463 .pSpecializationInfo = NULL 464 }, 465 { 466 .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT, 467 .pNext = NULL, 468 .flags = 0, 469 .stage = VK_SHADER_STAGE_FRAGMENT_BIT, 470 .nextStage = 0, 471 .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT, 472 .codeSize = fragmentShaderSpirvSize[1], 473 .pCode = pFragmentShaderSpirv[1], 474 .pName = "main", 475 .setLayoutCount = 1, 476 .pSetLayouts = &descriptorSetLayout, 477 .pushConstantRangeCount = 0, 478 .pPushConstantRanges = NULL, 479 .pSpecializationInfo = NULL 480 } 481}; 482 483VkShaderEXT shaders[3]; 484 485vkCreateShadersEXT(device, 3, shaderInfo, NULL, shaders); 486---- 487 488Alternatively, the same result could be achieved by: 489 490[source,c] 491---- 492VkShaderEXT shaders[3]; 493 494{ 495 VkShaderCreateInfoEXT shaderInfo = { 496 .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT, 497 .pNext = NULL, 498 .flags = 0, 499 .stage = VK_SHADER_STAGE_VERTEX_BIT, 500 .nextStage = VK_SHADER_STAGE_FRAGMENT_BIT, 501 .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT, 502 .codeSize = vertexShaderSpirvSize, 503 .pCode = pVertexShaderSpirv, 504 .pName = "main", 505 .setLayoutCount = 1, 506 .pSetLayouts = &descriptorSetLayout, 507 .pushConstantRangeCount = 0, 508 .pPushConstantRanges = NULL, 509 .pSpecializationInfo = NULL 510 }; 511 512 vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shaders[0]); 513} 514 515{ 516 VkShaderCreateInfoEXT shaderInfo = { 517 .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT, 518 .pNext = NULL, 519 .flags = 0, 520 .stage = VK_SHADER_STAGE_FRAGMENT_BIT, 521 .nextStage = 0, 522 .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT, 523 .codeSize = fragmentShaderSpirvSize[0], 524 .pCode = pFragmentShaderSpirv[0], 525 .pName = "main", 526 .setLayoutCount = 1, 527 .pSetLayouts = &descriptorSetLayout, 528 .pushConstantRangeCount = 0, 529 .pPushConstantRanges = NULL, 530 .pSpecializationInfo = NULL 531 }; 532 533 vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shaders[1]); 534} 535 536{ 537 VkShaderCreateInfoEXT shaderInfo = { 538 .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT, 539 .pNext = NULL, 540 .flags = 0, 541 .stage = VK_SHADER_STAGE_FRAGMENT_BIT, 542 .nextStage = 0, 543 .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT, 544 .codeSize = fragmentShaderSpirvSize[1], 545 .pCode = pFragmentShaderSpirv[1], 546 .pName = "main", 547 .setLayoutCount = 1, 548 .pSetLayouts = &descriptorSetLayout, 549 .pushConstantRangeCount = 0, 550 .pPushConstantRanges = NULL, 551 .pSpecializationInfo = NULL 552 }; 553 554 vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shaders[2]); 555} 556---- 557 558Later at draw time, the application binds independent vertex and fragment shaders forming a complete program: 559 560[source,c] 561---- 562VkShaderStageFlagBits stages[2] = { 563 VK_SHADER_STAGE_VERTEX_BIT, 564 VK_SHADER_STAGE_FRAGMENT_BIT 565}; 566vkCmdBindShadersEXT(commandBuffer, 2, stages, shaders); 567---- 568 569If the `tessellationShader` or `geometryShader` features are enabled on the device, the application sets the corresponding shader types to VK_NULL_HANDLE: 570 571[source,c] 572---- 573VkShaderStageFlagBits unusedStages[3] = { 574 VK_SHADER_STAGE_TESSELLATION_CONTROL_BIT, 575 VK_SHADER_STAGE_TESSELLATION_EVALUATION_BIT, 576 VK_SHADER_STAGE_GEOMETRY_BIT 577}; 578// Setting pShaders to NULL is equivalent to specifying an array of stageCount VK_NULL_HANDLE values 579vkCmdBindShadersEXT(commandBuffer, 3, unusedStages, NULL); 580---- 581 582Then, the application issues a draw call: 583 584[source,c] 585---- 586vkCmdDrawIndexed(commandBuffer, ...); 587---- 588 589Later, the application binds a different fragment shader without disturbing any other stages: 590 591[source,c] 592---- 593VkShaderStageFlagBits stage = VK_SHADER_STAGE_FRAGMENT_BIT; 594vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shaders[2]); 595---- 596 597Finally, the application issues another draw call: 598 599[source,c] 600---- 601vkCmdDrawIndexed(commandBuffer, ...); 602---- 603 604=== Compute 605 606At startup time, the application compiles a compute shader: 607 608[source,c] 609---- 610VkShaderCreateInfoEXT shaderInfo = { 611 .sType = VK_STRUCTURE_TYPE_SHADER_CREATE_INFO_EXT, 612 .pNext = NULL, 613 .flags = 0, 614 .stage = VK_SHADER_STAGE_COMPUTE_BIT, 615 .nextStage = 0, 616 .codeType = VK_SHADER_CODE_TYPE_SPIRV_EXT, 617 .codeSize = computeShaderSpirvSize, 618 .pCode = pComputeShaderSpirv, 619 .pName = "main", 620 .setLayoutCount = 1, 621 .pSetLayouts = &descriptorSetLayout, 622 .pushConstantRangeCount = 0, 623 .pPushConstantRanges = NULL, 624 .pSpecializationInfo = NULL 625}; 626 627VkShaderEXT shader; 628 629vkCreateShadersEXT(device, 1, &shaderInfo, NULL, &shader); 630---- 631 632Later, the application binds the compute shader: 633 634[source,c] 635---- 636VkShaderStageFlagBits stage = VK_SHADER_STAGE_COMPUTE_BIT; 637vkCmdBindShadersEXT(commandBuffer, 1, &stage, &shader); 638---- 639 640Finally, the application dispatches the compute: 641 642[source,c] 643---- 644vkCmdDispatch(commandBuffer, ...); 645---- 646 647== Issues 648 649=== RESOLVED: How should implementations which absolutely must link shader stages implement this extension? 650 651The purpose of this extension is to expose the flexibility of those implementations which allow arbitrary combinations of unlinked but compatible shader stages and state to be bound independently. Attempting to modify this extension to support implementations which do not have this flexibility would defeat the entire purpose of the extension. For this reason, implementations which do not have the required flexibility should not implement this extension. 652 653IHVs whose implementations have such limitations today are encouraged to consider incorporating changes which could remove these limitations into their future hardware roadmaps. 654 655=== RESOLVED: Should this extension try to reuse pipeline objects and concepts? 656 657No - the pipeline abstraction was never designed with such a radically different design in mind. 658 659Avoiding the introduction of a new object type and a handful of new entry points is not a compelling reason to continue to pile less and less pipeline-like functionality into pipelines. Doing so would needlessly constrict or even undermine the design and future extensibility of both models. 660 661=== RESOLVED: Should binary shader support be exposed in some way similar to existing pipeline caches or pipeline binaries? 662 663No - fixed platforms like game consoles and embedded systems have constraints which make shipping both SPIR-V and binary copies of the same shader code undesirable. 664 665=== RESOLVED: Should there be some kind of shader program object to represent a set of linked shaders? 666 667No - the compiled code for each shader stage is represented by a single `VkShaderEXT` object whether it is linked to other stages or not. 668 669Introducing a shader program object would overly complicate the API and impose a new and unnecessary object lifetime management burden on applications. Vulkan is a low level API, and it should be the application's responsibility to ensure that it keeps any promises it chooses to make about binding the correct stages together. 670 671[NOTE] 672==== 673Whenever shaders are created linked together, the rules for binding them give implementations the freedom to (for example) internally store the compiled code for multiple linked stages in a single stage's `VkShaderEXT` object and to leave the other stages' `VkShaderEXT` objects internally unused, though this is *strongly* discouraged. 674==== 675 676=== RESOLVED: Should there be some mechanism for applications to provide static state that is known at compile time? 677 678Not as part of this extension - it is possible to imagine some kind of "shader optimization hint" functionality to let applications provide implementations with "static state" similar to the existing static state in pipelines, but on an opt-in rather than opt-out basis. By providing a given piece of state in an optimization hint at shader creation time, an application could promise that the equivalent piece of dynamic state would always be set to some specific value whenever that shader is used, thereby allowing implementations to perform compile time optimizations similar to those they can make with pipelines today. 679 680For already pipeline-friendly applications with lots of static state this could serve as a "gentler" version of pipelines that might provide the best of both worlds, but it is unclear that the benefits of such a scheme for the (pipeline-unfriendly) majority of applications which actually need this extension would outweigh the costs of the added complexity to the API. 681 682If such functionality turns out to be important, it can be noninvasively layered on top of this extension in the form of another extension. Until then, applications wanting something that behaves like pipelines should just use pipelines. 683 684=== RESOLVED: Should this extension expose some abstraction for setting groups of related state? 685 686No - an earlier version of this proposal exposed a mechanism for applications to pre-create "interface shaders" which could then be bound on a command buffer to reduce draw time overhead. This added complexity to the API, and it was unclear that this solution would be able to deliver meaningful performance improvements over setting individual pieces of state on the command buffer. 687 688Such an abstraction may prove beneficial for certain implementations, but it should not be designed until those implementations have at least attempted to implement support for this extension in its existing form. 689 690=== RESOLVED: There is currently no dynamic state setting functionality for sample shading. How should this be handled? 691 692Sample shading is already implicitly enabled (with minSampleShading = 1.0) whenever a shader reads from the SampleId or SamplePosition builtins. The main functionality missing in the absence of dynamic sample shading is the ability to specify minSampleShading values other than 1.0. 693 694This could be addressed by introducing a new MinSampleShading shader builtin which can be either hard-coded or specialized at SPIR-V compile time using the existing specialization constant mechanism. However, since introducing this functionality is orthogonal to the objective of this extension this is left up to a different extension. 695 696Until such an extension is available, applications that need to specify a minSampleShading other than 1.0 should use pipelines. 697 698== Further Functionality 699 700 * Shader optimization hints 701 * State grouping 702 * Ray tracing shader objects 703