1// Copyright 2021-2023 The Khronos Group Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5# VK_AMD_shader_early_and_late_fragment_tests 6:toc: left 7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/ 8:sectnums: 9 10This document describes a proposal for a new SPIR-V execution mode that allows fragment shaders to be discarded by early fragment operations, even if they contain writes to storage resources or other side effects. 11 12 13## Problem Statement 14 15Most graphics devices are able to take advantage of early fragment operations when a fragment shader avoids certain operations - e.g. writing to depth or stencil, or to storage resources, providing significant performance advantages in most cases. 16This is implicitly enabled wherever possible, and can be explicitly enabled by specifying the `EarlyFragmentTests` execution mode in SPIR-V. 17However the `EarlyFragmentTests` execution mode makes it invalid to write fragment depth from a shader. 18 19Some implementations can perform depth testing both before _and_ after fragment shading, allowing a conservative early test to discard most fragments and a late test to discard with more precision. 20https://registry.khronos.org/OpenGL/extensions/ARB/ARB_conservative_depth.txt[GL_ARB_conservative_depth] added a way to enable this optimisation even when depth was written by the fragment shader, allowing further optimisations to be achieved in certain conditions. 21However, if the shader also writes to storage resources, no such optimisation is possible due to the predictability requirements of the specification. 22In cases where an application does not care whether storage writes are performed by a fragment shader when discarded, it is possible to use this capability for a significant performance improvement on some console platforms, but so far Vulkan has no mechanism to do this. 23For some applications, this can mean an unnecessary performance hit that should be relatively straightforward to solve. 24 25 26## Solution Space 27 28There is only really one solution to this problem, which is to expose this capability to applications in some way. 29The main question is how to expose this, at what granularity, and whether we should provide any guarantees. 30Ultimately this should be relatively easy to turn on/off, and ideally should be set per-fragment shader in some form. 31 32The main options for signifying the switch are: 33 34 . Pipeline creation flag 35 . SPIR-V execution mode 36 . Non-semantic instruction 37 38If it becomes a pipeline creation flag, it is easy to turn on/off on a per-pipeline basis. 39However, the knowledge of whether the writes can be discarded is usually a property of whatever algorithm has been written in the fragment shader code itself, meaning this property has to be specified in two places. 40From that perspective, it makes sense to have something in the shader code itself. 41 42A SPIR-V execution mode is a straightforward way to express this, and it is consistent with the way that conservative depth is expressed in SPIR-V (e.g. `DepthGreater` https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Mode[Execution Mode]). 43The downside of using an execution mode which is essentially an optimisation hint is that drivers have to implement the extension for the SPIR-V to be valid; when in fact it can be safely ignored in all cases. 44 45Non-Semantic instructions seem like they could offer a way to specify the behavior without need for drivers to implement anything new. 46Unfortunately, as the induced behavior violates the current Vulkan specification, it is not suitable for this use case, as it cannot be safely added to all shaders. 47 48Without a wider "this can be ignored but can change behavior" extension along the lines of the non-semantic extension, a SPIR-V execution mode is likely the most suitable option. 49Applications, a user-facing library, or a Vulkan software layer could be used to automatically remove the execution mode when not supported. 50Implementations should also eventually be able to support the execution mode as a no-op if they do not have the required capabilities. 51 52 53## Proposal 54 55### New Vulkan feature 56 57```c 58typedef struct VkPhysicalDeviceShaderEarlyAndLateFragmentTestsFeaturesAMD { 59 VkStructureType sType; 60 void* pNext; 61 VkBool32 shaderEarlyAndLateFragmentTests; 62} VkPhysicalDeviceShaderEarlyAndLateFragmentTestsFeaturesAMD; 63``` 64 65This feature allows the new execution mode in SPIR-V shaders consumed by the implementation. 66 67 68### New SPIR-V Execution Modes 69 70A new execution mode is introduced which allows for early depth and stencil tests to be performed both early and late when depth and stencil writes are performed, in combination with the depth optimisations. 71In order to allow for stencil reference writes with this new execution mode, similar stencil reference write optimisations are provided. 72 73[cols="1,6,2,1",options="header"] 74|==== 752+^| Execution mode ^| Extra Operands ^| Enabling Capabilities 76| 5017 | *EarlyAndLateFragmentTestsAMD* + 77Fragment tests can be performed both before and after fragment shader execution, with latter tests taking values written to _FragDepth_ and _FragStencilRefEXT_ into account. Early tests are not guaranteed, late tests are.+ 78 + 79If neither of *ExecutionModeDepthReplacing* or *ExecutionModeStencilRefReplacingEXT* are specified, functions identically to *EarlyFragmentTests*. + 80If this and *ExecutionModeStencilRefReplacingEXT* are both specified, one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, or *StencilRefUnchangedAMD* must also be specified. + 81If this and *ExecutionModeDepthReplacing* are both specified, one of *DepthGreater*, *DepthLess*, or *DepthUnchanged* must also be specified. + 82 + 83Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. + 84See client API for detail on fragment operations. 85| 86| *Shader* 87| 5079 | *StencilRefUnchangedFrontAMD* + 88Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is equal to the stencil reference value set for the front face in the client API after masking. 89Late per-fragment tests will use the written value as normal. + 90 + 91Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. + 92At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified. 93| 94| *StencilExportEXT* 95| 5080 | *StencilRefGreaterFrontAMD* + 96Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is greater than or equal to the stencil reference value set for the front face in the client API after masking. 97Late per-fragment tests will use the written value as normal. + 98 + 99Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. + 100At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified. 101| 102| *StencilExportEXT* 103| 5081 | *StencilRefLessFrontAMD* + 104Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is less than or equal to the stencil reference value set for the front face in the client API after masking. 105Late per-fragment tests will use the written value as normal. + 106 + 107Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. + 108At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified. 109| 110| *StencilExportEXT* 111| 5082 | *StencilRefUnchangedBackAMD* + 112Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is equal to the stencil reference value set for the back face in the client API after masking. 113Late per-fragment tests will use the written value as normal. + 114 + 115Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. + 116At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified. 117| 118| *StencilExportEXT* 119| 5083 | *StencilRefGreaterBackAMD* + 120Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is greater than or equal to the stencil reference value set for the back face in the client API after masking. 121Late per-fragment tests will use the written value as normal. + 122 + 123Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. + 124At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified. 125| 126| *StencilExportEXT* 127| 5084 | *StencilRefLessBackAMD* + 128Indicates that early per-fragment tests may assume that any _FragStencilRefEXT_ built in-decorated value written by the shader is less than or equal to the stencil reference value set for the back face in the client API after masking. 129Late per-fragment tests will use the written value as normal. + 130 + 131Only valid with the Fragment https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#Execution_Model[Execution Model]. + 132At most one of *StencilRefGreaterAMD*, *StencilRefLessAMD*, and *StencilRefUnchangedAMD* can be specified. 133| 134| *StencilExportEXT* 135|==== 136 137This allows implementations to perform both early and late tests explicitly. 138 139 140### New GLSL Layout Qualifiers 141 142The following new layout qualifiers are added to GLSL: 143 144Fragment shaders allow the following stand-alone declaration: 145 146``` 147__early_and_late_fragment_testsAMD 148``` 149 150to request that certain fragment tests be performed before and after fragment shader execution, as described in 151the "`Fragment Operations`" chapter of the Vulkan 1.2 Specification. 152This declaration must appear in a line on its own. 153 154The following additional standalone declarations may be specified: 155 156``` 157layout-qualifier-id: 158 __stencil_ref_unchanged_frontAMD 159 __stencil_ref_less_frontAMD 160 __stencil_ref_greater_frontAMD 161 __stencil_ref_unchanged_backAMD 162 __stencil_ref_less_backAMD 163 __stencil_ref_greater_backAMD 164``` 165 166These declarations must each appear in a line on their own. 167Only one __stencil_ref_*_frontAMD and one __stencil_ref_*_backAMD declaration may be specified. 168Each declaration constrains the intentions of the final value of `gl_FragStencilRefARB` written by any shader invocation. 169Implementations are allowed to perform optimizations assuming that the stencil test fails (or passes) for a given fragment if all values of `gl_FragStencilRefARB` consistent with the declaration would fail (or pass). 170This potentially includes skipping shader execution if the fragment is discarded because it is occluded and the shader has no side effects. 171If the final value of `gl_FragStencilRefARB` is inconsistent with the declaration for the facing of the shaded polygon, the result of the stencil test for the corresponding fragment is undefined. 172If the stencil test passes and stencil writes are enabled, the value written to the stencil buffer is always the value of `gl_FragStencilRefARB`, whether or not it is consistent with the layout qualifier. 173 174Each of the above qualifiers maps directly to the equivalently named spir-v execution mode. 175 176 177### New HLSL Attributes 178 179The following new https://github.com/microsoft/DirectXShaderCompiler/blob/master/docs/SPIR-V.rst#vulkan-specific-attributes[Vulkan Specific Attribute] is added: 180 181 * `early_and_late_tests`: Marks an entry point as enabling early and late depth tests. 182 If depth is written via `SV_Depth`, `depth_unchanged` must also be specified (SV_DepthLess and SV_DepthGreater can be written freely). 183 If a stencil reference value is written via `SV_StencilRef`, one of `stencil_ref_unchanged_front`, `stencil_ref_greater_equal_front`, or `stencil_ref_less_equal_front` and one of `stencil_ref_unchanged_back`, `stencil_ref_greater_equal_back`, or `stencil_ref_less_equal_back` must be specified. 184 * `depth_unchanged`: Specifies that any depth written to `SV_Depth` will not invalidate the result of early depth tests. 185 Sets the `DepthUnchanged` execution mode in SPIR-V. 186 * `stencil_ref_unchanged_front`: Specifies that any stencil ref written to `SV_StencilRef` will not invalidate the result of early stencil tests when the fragment is front facing. 187 Sets the `StencilRefUnchangedFrontAMD` execution mode in SPIR-V. 188 * `stencil_ref_greater_equal_front`: Specifies that any stencil ref written to `SV_StencilRef` will be greater than or equal to the stencil reference value set by the API when the fragment is front facing. 189 Sets the `StencilRefGreaterFrontAMD` execution mode in SPIR-V. 190 * `stencil_ref_less_equal_front`: Specifies that any stencil ref written to `SV_StencilRef` will be less than or equal to the stencil reference value set by the API when the fragment is front facing. 191 Sets the `StencilRefLessFrontAMD` execution mode in SPIR-V. 192 * `stencil_ref_unchanged_back`: Specifies that any stencil ref written to `SV_StencilRef` will not invalidate the result of early stencil tests when the fragment is back facing. 193 Sets the `StencilRefUnchangedBackAMD` execution mode in SPIR-V. 194 * `stencil_ref_greater_equal_back`: Specifies that any stencil ref written to `SV_StencilRef` will be greater than or equal to the stencil reference value set by the API when the fragment is back facing. 195 Sets the `StencilRefGreaterBackAMD` execution mode in SPIR-V. 196 * `stencil_ref_less_equal_back`: Specifies that any stencil ref written to `SV_StencilRef` will be less than or equal to the stencil reference value set by the API when the fragment is back facing. 197 Sets the `StencilRefLessBackAMD` execution mode in SPIR-V. 198 199Shaders must not specify more than one of `stencil_ref_unchanged_front`, `stencil_ref_greater_equal_front`, and `stencil_ref_less_equal_front`. 200Shaders must not specify more than one of `stencil_ref_unchanged_back`, `stencil_ref_greater_equal_back`, and `stencil_ref_less_equal_back`. 201 202 203## Issues 204 205### UNRESOLVED: Should we expose a feature/property indiciting if the implementation is actually going to perform early and late tests? 206 207It would be useful if ultimately all implementations could ship this feature, treating it as a no-op where relevant - but if some implementations cannot gain any advantage from this, it might be reasonable to expose a property indicating this. 208