1// Copyright 2021-2023 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5= VK_KHR_fragment_shading_rate
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10This extension adds the ability to change the rate at which fragments are shaded. Rather than the usual single fragment invocation for each pixel covered by a primitive, multiple pixels can be shaded by a single fragment shader invocation.
11
12== Problem Statement
13
14Rendering resolutions are continually getting higher and higher, but as resolutions increase, the requirements on device performance increase at the same rate.
15However with a move from e.g. a 4K resolution to an 8K resolution, effectively doubling visual fidelity, this quadruples the requirements on device performance in order to keep up.
16In many cases, this extra pixel fidelity is not necessarily perceptible - and uniformly increasing the rate at which pixels are generated results in unnecessary work being performed, and it would be useful to reclaim some of that performance to improve the overall experience for an end user. This could be due to low-detail objects, triangles with a low delta between pixels, or something like VR where a user will not perceive detail in their peripheral vision.
17
18There are three potential bottlenecks as resolution requirements increase: the rasterizer's ability to generate pixels, fragment shading, and bandwidth. This proposal focuses on reducing the shading rate, as this is the primary bottleneck on many implementations; though implementations may be able to take advantage of this to reduce workloads in other areas.
19
20
21== Solution Space
22
23Current solutions to address this require a uniform rate to be applied across the screen - things like MSAA, sample rate shading, and custom sample locations can be used to modify the rate at which shading occurs, but this is always applied uniformly across the screen; though steps can be taken to apply different rates to different sets of geometry by modifying state between draw calls.
24However, this requires careful state management, and requires awkward sorting of geometry/draws in order to achieve anything other than a natural per-draw rate.
25
26Different applications may want to change the rate per-draw, per-triangle, or per-screen-region.
27While it would be possible to modify the behavior of sample shading to be modifiable at different rates to solve this, multisample state is relatively complex, and could result in tricky edge cases for some applications.
28The alternative is to provide a new shading rate state that is independent of multisampling, and enable it to be set at each separate rate.
29In either case, per draw rate can be set by pipeline or dynamic state, but for per-triangle and per-screen-region use cases, new mechanisms will be needed. For per-triangle state, the usual way of setting this is in the API is by providing data along with the provoking vertex. For the screen regions, two main options are viable - either an associated image which has sub regions identifying the state, or providing some sort of equation to be applied across the screen.
30
31Due to the complexity and potential fragility of multisample state, this proposal introduces new shading rate state to the API. As not all known use cases for screen-region state can be expressed as an straightforward equation, per-image state allowing arbitrary expression of regions is preferred.
32
33
34== Proposal
35
36This extension introduces the concept of a fragment size to the API, where a given fragment can cover more than one pixel in the output attachments.
37Each fragment is still rasterized at a per-pixel rate, as it effectively has one sample per pixel - but when shaded, only one shader is invoked for the entire fragment.
38This is very similar to multisampling, where multiple samples are generated for each pixel, but on a wider scale.
39This state also interacts with multisampling, such that one fragment can cover multiple pixels, with multiple samples per pixel.
40Note though that enabling sample shading will effectively disable the fragment shading rate.
41
42=== Per-Draw state
43
44This state can be set per-draw as part of the pipeline by chaining the following structure to link:{refpage}VkGraphicsPipelineCreateInfo.html[VkGraphicsPipelineCreateInfo]:
45
46[source,c]
47----
48typedef struct VkPipelineFragmentShadingRateStateCreateInfoKHR {
49    VkStructureType                       sType;
50    const void*                           pNext;
51    VkExtent2D                            fragmentSize;
52    VkFragmentShadingRateCombinerOpKHR    combinerOps[2];
53} VkPipelineFragmentShadingRateStateCreateInfoKHR;
54----
55
56It can also be set dynamically by setting `VK_DYNAMIC_STATE_FRAGMENT_SHADING_RATE_KHR` on the pipeline and using:
57
58[source,c]
59----
60void vkCmdSetFragmentShadingRateKHR(
61    VkCommandBuffer                             commandBuffer,
62    const VkExtent2D*                           pFragmentSize,
63    const VkFragmentShadingRateCombinerOpKHR    combinerOps[2]);
64----
65
66In each case, the link:{refpage}VkExtent2D.html[VkExtent2D] sets the base fragment size in the x and y dimensions for all fragments generated by draw calls using this state.
67
68As there are three rates at which the state can be set, rather than having these only set one at a time, applications can have all three rates set and define combiner operations to dictate how the final result is calculated.
69This allows applications to e.g. have a per-screen-region rate while also marking some triangles or objects as lower detail than the base rate.
70
71The per-draw and per-triangle rates are first combined according to the first combiner operation, and then the result of that is combined according to the second combiner operation.
72Available combiner operations are as follows:
73
74
75[source,c]
76----
77typedef enum VkFragmentShadingRateCombinerOpKHR {
78    VK_FRAGMENT_SHADING_RATE_COMBINER_OP_KEEP_KHR = 0,
79    VK_FRAGMENT_SHADING_RATE_COMBINER_OP_REPLACE_KHR = 1,
80    VK_FRAGMENT_SHADING_RATE_COMBINER_OP_MIN_KHR = 2,
81    VK_FRAGMENT_SHADING_RATE_COMBINER_OP_MAX_KHR = 3,
82    VK_FRAGMENT_SHADING_RATE_COMBINER_OP_MUL_KHR = 4,
83} VkFragmentShadingRateCombinerOpKHR;
84----
85
86`KEEP` will select the first rate in the combination, while `REPLACE` will select the second rate.
87`MIN` and `MAX` will select the minimum/maximum rate respectively, and do so separately for each dimension.
88So e.g. taking the max of `(1,2)` and `(2,1)` would result in `(2,2)`.
89
90The `MUL` operation multiples each dimension of the first input rate by the corresponding rate in the second input rate. So e.g. `(2,2)` and `(1,4)` would result in `(2,8)`.
91
92NOTE: The Vulkan specification chose to define this as a MUL operation using linear values to make this clear; whereas the DirectX Variable Rate Shading specification defines it as an addition in log2 space using bit flags. This unfortunately resulted in a misunderstanding between vendors, giving rise to the `fragmentShadingRateStrictMultiplyCombiner` limit, which when `VK_FALSE` indicates this operation acts as an addition. Fortunately, this only practically changes the result of a single combination - where the sum of 1 and 1 is 2 instead of a product of 1. All other combinations are clamped to 2 or 4, giving the same result as a true multiplication would provide.
93
94The result of the combiner operations will always be clamped to maximum supported rate of the implementation given the current draw state.
95
96When none of the above state is set, the fragment size is treated as 1 by 1, and the combiner ops are set to KEEP.
97
98
99=== Per-Triangle state
100
101The per-triangle shading rate can be set by a new output in pre-rasterization shaders that is set on the provoking vertex:
102
103[options="header"]
104|====
1052+| BuiltIn| Enabling Capabilities | Enabled by Extension
106| 4432 | *PrimitiveShadingRateKHR* +
107Output primitive fragment shading rate.
108Only valid in the *Vertex*, *Geometry*, and *MeshNV* Execution Models.
109See the API specification for more detail.
110| *FragmentShadingRateKHR* | *SPV_KHR_fragment_shading_rate*
111|====
112
113This value is set to a single integer value according to four flag values:
114
115[cols="1,15,5",options="header",width = "80%"]
116|====
1172+^.^| Fragment Shading Rate Flags | Enabling Capabilities
118| 1 | *Vertical2Pixels*  +
119Fragment invocation covers 2 pixels vertically.
120| *FragmentShadingRateKHR*
121| 2 | *Vertical4Pixels*  +
122Fragment invocation covers 4 pixels vertically.
123| *FragmentShadingRateKHR*
124| 4 | *Horizontal2Pixels*  +
125Fragment invocation covers 2 pixels horizontally.
126| *FragmentShadingRateKHR*
127| 8 | *Horizontal4Pixels*  +
128Fragment invocation covers 4 pixels horizontally.
129| *FragmentShadingRateKHR*
130|====
131
132Valid rate combinations must not include more than 1 horizontal and 1
133vertical rate.
134If no horizontal rate flags are set, it indicates a fragment shader covers one
135pixel horizontally.
136If no vertical rate flags are set, it indicates a fragment shader covers one
137pixel vertically.
138
139This functionality is gated behind a new capability:
140
141[options="header"]
142|====
1432+| Capability | Implicitly Declares
144| 4422 | *FragmentShadingRateKHR* +
145Uses the *PrimitiveShadingRateKHR* or *ShadingRateKHR* Builtins. | *Shader*
146|====
147
148
149=== Per-Region state
150
151The per-region state can be set through an image where a pixel in that image corresponds to a given region in the render.
152Using the same flag values as the per-triangle rate, the value of that pixel determines the per-region rate for the corresponding region.
153This image can be set per-subpass by chaining the following structure to link:{refpage}VkSubpassDescription2.html[VkSubpassDescription2]:
154
155[source,c]
156----
157typedef struct VkFragmentShadingRateAttachmentInfoKHR {
158    VkStructureType                  sType;
159    const void*                      pNext;
160    const VkAttachmentReference2*    pFragmentShadingRateAttachment;
161    VkExtent2D                       shadingRateAttachmentTexelSize;
162} VkFragmentShadingRateAttachmentInfoKHR;
163----
164
165`pFragmentShadingRateAttachment` selects the attachment description corresponding to the image, which must have dimensions at least equal to the framebuffer size divided by the texel size selected by `shadingRateAttachmentTexelSize`.
166`shadingRateAttachmentTexelSize` can be set to values supported by the implementation, which are advertised via `maxFragmentShadingRateAttachmentTexelSize`, `minFragmentShadingRateAttachmentTexelSize`, `maxFragmentShadingRateAttachmentTexelSizeAspectRatio`, and must be power-of-two values.
167
168
169=== Reading the final rate
170
171In a fragment shader, the final calculated rate can be read through a new built-in:
172
173[options="header"]
174|====
1752+| BuiltIn| Enabling Capabilities | Enabled by Extension
176| 4444 | *ShadingRateKHR* +
177Input fragment shading rate for the current shader
178invocation.
179Only valid in the *Fragment* Execution Model.
180See the API specification for more detail.
181| *FragmentShadingRateKHR* | *SPV_KHR_fragment_shading_rate*
182|====
183
184=== Properties
185
186Properties of the implementation can be queried via a new properties structure:
187
188[source,c]
189----
190typedef struct VkPhysicalDeviceFragmentShadingRatePropertiesKHR {
191    VkStructureType          sType;
192    void*                    pNext;
193    VkExtent2D               minFragmentShadingRateAttachmentTexelSize;
194    VkExtent2D               maxFragmentShadingRateAttachmentTexelSize;
195    uint32_t                 maxFragmentShadingRateAttachmentTexelSizeAspectRatio;
196    VkBool32                 primitiveFragmentShadingRateWithMultipleViewports;
197    VkBool32                 layeredShadingRateAttachments;
198    VkBool32                 fragmentShadingRateNonTrivialCombinerOps;
199    VkExtent2D               maxFragmentSize;
200    uint32_t                 maxFragmentSizeAspectRatio;
201    uint32_t                 maxFragmentShadingRateCoverageSamples;
202    VkSampleCountFlagBits    maxFragmentShadingRateRasterizationSamples;
203    VkBool32                 fragmentShadingRateWithShaderDepthStencilWrites;
204    VkBool32                 fragmentShadingRateWithSampleMask;
205    VkBool32                 fragmentShadingRateWithShaderSampleMask;
206    VkBool32                 fragmentShadingRateWithConservativeRasterization;
207    VkBool32                 fragmentShadingRateWithFragmentShaderInterlock;
208    VkBool32                 fragmentShadingRateWithCustomSampleLocations;
209    VkBool32                 fragmentShadingRateStrictMultiplyCombiner;
210} VkPhysicalDeviceFragmentShadingRatePropertiesKHR;
211----
212
213The limits are somewhat complex, as this functionality interacts heavily with other state, however many of these states are informative only; the implementation will automatically reduce the fragment shading rate to `(1,1)` when they are violated.
214`minFragmentShadingRateAttachmentTexelSize`, `maxFragmentShadingRateAttachmentTexelSize`, `maxFragmentShadingRateAttachmentTexelSizeAspectRatio`, `primitiveFragmentShadingRateWithMultipleViewports`, `fragmentShadingRateNonTrivialCombinerOps`, and `layeredShadingRateAttachments` are the only hard limits.
215`fragmentShadingRateStrictMultiplyCombiner` affects the operation of certain combiner operations, and cannot be violated.
216
217These limits must be adhered to by an application for correct behavior:
218
219* `minFragmentShadingRateAttachmentTexelSize` advertises the minimum size of the texel region for the per-region rate supported by the implementation.
220* `maxFragmentShadingRateAttachmentTexelSize` advertises the maximum size of the texel region for the per-region rate supported by the implementation.
221* `maxFragmentShadingRateAttachmentTexelSizeAspectRatio` advertises the maximum aspect ratio of the texel region for the per-region rate supported by the implementation.
222* `primitiveFragmentShadingRateWithMultipleViewports` advertises whether applications can write the primitive fragment shading rate when multiple viewports are used. Does not affect multiview.
223* `layeredShadingRateAttachments` advertises whether applications can use separate shading rate attachments for independent layers when performing layered rendering. Does not affect multiview.
224* `fragmentShadingRateNonTrivialCombinerOps` advertises whether applications can set the combiner ops to anything other than `KEEP` or `REPLACE`.
225
226Violating these limits is not invalid - instead the implementation will automatically reduce the fragment shading rate to `(1,1)` if any of them are violated.
227This allows applications to ship one algorithm while still ensuring valid behavior.
228
229* `maxFragmentSize` determines the maximum supported fragment size.
230* `maxFragmentSizeAspectRatio` determines the maximum supported aspect ratio between dimensions for the fragment size.
231* `maxFragmentShadingRateCoverageSamples` determines the maximum total coverage samples for a fragment as a product of the fragment shading rate in each dimension and the multisample rate.
232* `maxFragmentShadingRateRasterizationSamples` determines the maximum multisample rate (`rasterizationSamples`) when using a fragment shading rate.
233* `fragmentShadingRateWithShaderDepthStencilWrites` determines if depth/stencil export from a shader can be used with fragment shading rate.
234* `fragmentShadingRateWithSampleMask` determines if the `pSampleMask` member of link:{refpage}VkPipelineMultisampleStateCreateInfo.html[VkPipelineMultisampleStateCreateInfo] can have any valid bits equal to 0 when using with fragment shading rate.
235* `fragmentShadingRateWithShaderSampleMask` determines if the sample mask (input or output) can be used in a shader with fragment shading rate.
236* `fragmentShadingRateWithConservativeRasterization` determines if conservative rasterization can be used with fragment shading rate.
237* `fragmentShadingRateWithFragmentShaderInterlock` determines if fragment shader interlock can be used with fragment shading rate.
238* `fragmentShadingRateWithCustomSampleLocations` determines if custom sample locations can be used with fragment shading rate.
239
240This final limit cannot be violated:
241
242* `fragmentShadingRateStrictMultiplyCombiner` determines whether the operation of the MUL combiner operation is correct - if it is `VK_FALSE`, MUL acts as a sum operation.
243
244NOTE: See the definition of `VK_FRAGMENT_SHADING_RATE_COMBINER_OP_MUL_KHR` for more information.
245
246
247=== Available shading rates
248
249To advertise precisely which shading rates are supported by an implementation, the following function is added to the specification:
250
251[source,c]
252----
253VkResult vkGetPhysicalDeviceFragmentShadingRatesKHR(
254    VkPhysicalDevice                            physicalDevice,
255    uint32_t*                                   pFragmentShadingRateCount,
256    VkPhysicalDeviceFragmentShadingRateKHR*     pFragmentShadingRates);
257
258typedef struct VkPhysicalDeviceFragmentShadingRateKHR {
259    VkStructureType       sType;
260    void*                 pNext;
261    VkSampleCountFlags    sampleCounts;
262    VkExtent2D            fragmentSize;
263} VkPhysicalDeviceFragmentShadingRateKHR;
264----
265
266This function returns the full list of supported fragment shading rates ordered from largest fragment size to smallest, with all valid sample rates.
267Implementations must support the following rates:
268
269[options="autowidth"]
270|===
271| `sampleCounts`                                   | `fragmentSize`
272
273| `VK_SAMPLE_COUNT_1_BIT \| VK_SAMPLE_COUNT_4_BIT` | {2,2}
274| `VK_SAMPLE_COUNT_1_BIT \| VK_SAMPLE_COUNT_4_BIT` | {2,1}
275| ~0                                               | {1,1}
276|===
277
278(1,1) is included for completeness only.
279Even if a shading rate advertises a given sample rate, valid sample rates are still subject to usual constraints on multisampling.
280
281
282=== Features
283
284Each of the three rates is enabled by an independent feature:
285
286[source,c]
287----
288typedef struct VkPhysicalDeviceFragmentShadingRateFeaturesKHR {
289    VkStructureType    sType;
290    void*              pNext;
291    VkBool32           pipelineFragmentShadingRate;
292    VkBool32           primitiveFragmentShadingRate;
293    VkBool32           attachmentFragmentShadingRate;
294} VkPhysicalDeviceFragmentShadingRateFeaturesKHR;
295----
296
297  * `pipelineFragmentShadingRate` indicates support for the per-draw fragment shading rate, both dynamic and pipeline state. This feature must be supported to support the extension.
298  * `primitiveFragmentShadingRate` indicates support for the per-triangle fragment shading rate.
299  * `attachmentFragmentShadingRate` indicates support for the per-screen-region fragment shading rate.
300
301
302== Examples
303
304Two concrete samples are available in the https://github.com/KhronosGroup/Vulkan-Samples[KhronosGroup/Vulkan-Samples] repository:
305
306  * https://github.com/KhronosGroup/Vulkan-Samples/tree/master/samples/extensions/fragment_shading_rate
307  * https://github.com/KhronosGroup/Vulkan-Samples/tree/master/samples/extensions/fragment_shading_rate_dynamic
308
309== Issues
310
311This section describes issues with the existing proposal – including both open issues that you have not addressed, and closed issues that are not self-evident from the proposal description.
312
313=== RESOLVED: Should the result of combiners be required to be a valid rate?
314
315This makes a number of combinations nigh impossible to use, so instead combined values are clamped, with strict rules on how they are clamped.
316
317=== RESOLVED: Should the various limits on state setting be validated?
318
319Convention suggests they should be, but this makes the extension much harder to use - by asking implementations to clamp the rate to (1,1) instead, applications can ship the same functionality everywhere without having to modify their algorithm or assets.
320
321=== RESOLVED: Should we describe the final combiner operation as a multiplication or addition? Related, should the per-draw fragment shading rate be set as flags or raw values?
322
323The primitive and image rates have to be bit flags to maintain compatibility with other APIs. There was significant confusion about the meaning of the final combiner operation as an addition of log2 values, so the choice was made to describe this as a multiplication of raw values, and the API values were set as real values to make this clearer.
324
325=== RESOLVED: When no fragment shading rate is provided, should the default rate {1, 1} take part in combination operation?
326
327Yes.
328When no fragment shading rate is given in a certain stage, the default rate {1, 1} is used and participates in combination operations.
329For example, if per-draw/per-triangle/per-region shading rates are all enabled and `combinerOps` are `REPLACE`/`KEEP`, with a per-draw rate of {4, 2}, a per-region rate of {2, 2}, and no declaration of `FragmentShadingRateKHR` in the fragment shader (so it takes a default of {1, 1}), the final fragment size is {1, 1}.
330