1// Copyright 2021-2023 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5# VK_EXT_multisampled_render_to_single_sampled
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10This document identifies difficulties with efficient multisampled rendering on
11tiling GPUs and proposes an extension to improve it.
12
13## Problem Statement
14
15With careful usage of resolve attachments, multisampled image memory allocated
16with `VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT`, `loadOp` not equal to
17`VK_ATTACHMENT_LOAD_OP_LOAD`, and `storeOp` not equal to
18`VK_ATTACHMENT_STORE_OP_STORE`, a Vulkan application is able to efficiently
19perform multisampled rendering without incurring any additional memory penalty
20on tiling GPUs in most cases.
21
22On some tiling GPUs, subpass resolve operations for some formats cannot be done
23on the tile, and so additional performance and memory cost is silently paid
24similarly to performing the resolve through
25link:{refpage}vkCmdResolveImage.html[`vkCmdResolveImage`] after the subpass,
26with no feedback to the application.
27
28Additionally, under certain circumstances, the application may not be able to
29complete its multisampled rendering within a single render pass; for example if
30it does partial rasterization from frame to frame, blending on an image from a
31previous frame, or in emulation of `GL_EXT_multisampled_render_to_texture`.
32In such cases, the application can use an initial subpass to effectively load
33single sampled data from the next subpass's resolve attachment and fill in the
34multisampled attachment which otherwise uses `loadOp` equal to
35`VK_ATTACHMENT_LOAD_OP_DONT_CARE`.
36However, this is not always possible (for example for stencil in the absence of
37`VK_EXT_shader_stencil_export`) and has multiple drawbacks.
38
39Some implementations are able to perform said operation efficiently in
40hardware, effectively loading a multisampled attachment from the contents of a
41single sampled one.
42Together with the ability to perform a resolve operation at the end of a
43subpass, these implementations are able to perform multisampled rendering on
44single-sampled attachments with no extra memory or bandwidth overhead.
45
46This document proposes an extension that exposes this capability by allowing a
47framebuffer and render pass to include single-sampled attachments while
48rendering is done with a specified number of samples.
49
50## Proposal
51
52The extension first allows a framebuffer to contain a mixture of single-sampled
53and multisampled attachments.
54In the absence of `VkMultisampledRenderToSingleSampledInfoEXT`, a render pass
55subpass which performs multisampled rendering with `N` samples would still
56require all the attachments used in the subpass to have `N` samples.
57Similarly with `VK_EXT_dynamic_rendering`, the attachments can be a mixture of
58single-sampled and multisampled if `VkMultisampledRenderToSingleSampledInfoEXT`
59is present.
60
61In the following, a _pass_ refers to either a render pass subpass, or a
62`VK_EXT_dynamic_rendering` render pass.
63
64When `VkMultisampledRenderToSingleSampledInfoEXT` is provided, specifying that
65rendering is done with `N` samples, then any attachment used in the pass may
66either have one or `N` samples.
67In that case, attachments with one sample will automatically load as
68multisampled for the duration of the pass (where every pixel's value is
69replicated in all samples of that pixel on tile memory) and will automatically
70resolve at the end of the pass.
71This document refers to such single-sampled attachments as
72multisampled-render-to-single-sampled attachments.
73
74Additionally, this extension provides a means to the application to determine
75whether usage of a format for attachments will be detrimental to performance
76during a pass resolve operation, which can particularly adversely affect
77multisampled-render-to-single-sampled passes.
78
79Introduced by this API are:
80
81Feature, advertising whether the implementation supports
82multisampled-rendering-to-single-sampled:
83
84[source,c]
85----
86typedef struct VkPhysicalDeviceMultisampledRenderToSingleSampledFeaturesEXT {
87    VkStructureType    sType;
88    void*              pNext;
89    VkBool32           multisampledRenderToSingleSampled;
90} VkPhysicalDeviceMultisampledRenderToSingleSampledFeaturesEXT;
91----
92
93Performance query specifying whether usage of an attachment that is resolved at
94the end of a pass with a format will be optimal on hardware:
95
96[source,c]
97----
98typedef struct VkSubpassResolvePerformanceQueryEXT {
99    VkStructureType               sType;
100    void*                         pNext;
101    VkBool32                      optimal;
102} VkSubpassResolvePerformanceQueryEXT;
103----
104
105Specifying that a pass should perform multisampled-rendering-to-single-sampled
106with `N` sample counts (extending `VkSubpassDescription2` and
107                `VkRenderingInfo`):
108
109[source,c]
110----
111typedef struct VkMultisampledRenderToSingleSampledInfoEXT {
112    VkStructureType               sType;
113    void*                         pNext;
114    VkBool32                      multisampledRenderToSingleSampledEnable;
115    VkSampleCountFlagBits         rasterizationSamples;
116} VkMultisampledRenderToSingleSampledInfoEXT;
117----
118
119An image creation flag to indicate the intention of using a single-sampled
120image in a multisampled-render-to-single-sampled pass:
121
122[source,c]
123----
124VK_IMAGE_CREATE_MULTISAMPLED_RENDER_TO_SINGLE_SAMPLED_BIT_EXT
125----
126
127In a multisampled-render-to-single-sampled pass with `N` samples, all rendering
128is done with `N` samples as if any single-sampled attachments truly had `N`
129samples.
130This means that
131link:{refpage}VkPipelineMultisampleStateCreateInfo.html[`VkPipelineMultisampleStateCreateInfo::rasterizationSamples`]
132would have to be `N`, and rasterization is done identically to Vulkan's
133multisampling rules for passes not using this extension.
134As such, the functionality in this extension purely affects the load and store
135of single-sampled attachments and their automatic representation as
136multisampled for the duration of the pass.
137
138Regardless of which load and store ops are used, the single-sampled attachments
139in a multisampled-render-to-single-sampled passes are represented as
140multisampled.
141The different load and store ops behave identically to the case where
142multisampled attachments are used.
143The following clarifies the ops in combination with
144multisampled-render-to-single-sampled attachments:
145
146- `VK_ATTACHMENT_LOAD_OP_LOAD`: For each pixel, its value is replicated in all
147  the `N` corresponding samples at the start of the pass.
148- `VK_ATTACHMENT_LOAD_OP_CLEAR`: The multisampled representation of the
149  attachment is cleared, not the single-sampled attachment.
150- `VK_ATTACHMENT_LOAD_OP_DONT_CARE`: Specifies that the previous contents of
151  the single-sampled attachment need not be preserved, and the contents of the
152  multisampled representation of the attachment will be undefined.
153- `VK_ATTACHMENT_LOAD_OP_NONE_EXT`: Specifies that the previous contents of the
154  single-sampled attachment will be preserved, but the contents of the
155  multisampled representation of the attachment will be undefined.
156
157- `VK_ATTACHMENT_STORE_OP_STORE`: The result of rendering is automatically
158  resolved into the single-sampled attachment at the end of the pass and
159  multisampled data is discarded.
160  With render passes, if a subpass follows that reads from the attachment as a
161  multisampled-render-to-single-sampled input attachment, it is undefined
162  whether the previous subpass's multisampled data are returned or the resolved
163  values.
164- `VK_ATTACHMENT_STORE_OP_DONT_CARE`: Specifies that the multisampled contents
165  are not needed after rendering, and may be discarded.
166  The contents of the single-sampled attachment will be undefined.
167- `VK_ATTACHMENT_STORE_OP_NONE_KHR`: Specifies that the contents of the
168  single-sampled attachment is not accessed by the store operation, but will be
169  undefined if the attachment was written to during the pass.
170
171While this extension adds a query for the resolve performance of attachments
172with a format, the results are not limited to
173multisampled-render-to-single-sampled passes, and are also applicable to passes
174with separate multisampled and single-sampled attachments with a resolve
175operation.
176
177== Examples
178
179To determine whether a format is suitable for use as a
180multisampled-render-to-single-sampled attachment for optimal performance:
181
182[source,c]
183----
184VkSubpassResolvePerformanceQueryEXT perfQuery = {
185    .sType = VK_STRUCTURE_TYPE_SUBPASS_RESOLVE_PERFORMANCE_QUERY_EXT,
186};
187
188VkFormatProperties2 formatProperties = {
189    .sType = VK_STRUCTURE_TYPE_FORMAT_PROPERTIES_2;
190    .pNext = &perfQuery;
191};
192
193vkGetPhysicalDeviceFormatProperties2(device, format, &formatProperties);
194----
195
196To create a render pass with a multisampled-render-to-single-sampled subpass
197with 4 samples:
198
199[source,c]
200----
201// Render pass attachments with mixed sample count
202VkAttachmentDescription2 attachmentDescs[3] = {
203    [0] = {
204        .sType = VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2_KHR,
205        .format = ...,
206        .samples = 1,
207        .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
208        .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
209        .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
210        .stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
211        .initialLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
212        .finalLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL,
213    },
214    [1] = {
215        .sType = VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2_KHR,
216        .format = ...,
217        .samples = 4,
218        .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
219        .storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
220        .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
221        .stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
222        .initialLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
223        .finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
224    },
225    [2] = {
226        .sType = VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2_KHR,
227        .format = ...,
228        .samples = 1,
229        .loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
230        .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
231        .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
232        .stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
233        .initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL,
234        .finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL,
235    },
236};
237
238// Subpass attachment references
239VkAttachmentReference2 colorAttachments[2] = {
240    [0] = {
241        .sType = VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2,
242        .attachment = 0,
243        .layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
244        .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
245    },
246    [1] = {
247        .sType = VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2,
248        .attachment = 1,
249        .layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
250        .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT,
251    },
252};
253
254VkAttachmentReference2 depthStencilAttachment = {
255    .sType = VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2,
256    .attachment = 0,
257    .layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL,
258    .aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT | VK_IMAGE_ASPECT_STENCIL_BIT,
259};
260
261// Multisampled-render-to-single-sampling info.  Rendering at 4xMSAA.
262VkMultisampledRenderToSingleSampledInfoEXT msrtss = {
263    .sType = VK_STRUCTURE_TYPE_MULTISAMPLED_RENDER_TO_SINGLE_SAMPLED_INFO_EXT,
264    .multisampledRenderToSingleSampledEnable = VK_TRUE,
265    .rasterizationSamples = 4,
266};
267
268// Resolve modes for depth/stencil
269VkSubpassDescriptionDepthStencilResolve depthStencilResolve = {
270    .sType = VK_STRUCTURE_TYPE_SUBPASS_DESCRIPTION_DEPTH_STENCIL_RESOLVE,
271    .pNext = &msrtss,
272    .depthResolveMode = VK_RESOLVE_MODE_SAMPLE_ZERO_BIT,
273    .stencilResolveMode = VK_RESOLVE_MODE_NONE,
274};
275
276// The subpass description where multisampled-render-to-single-sampled rendering is enabled.
277VkSubpassDescription2 subpassDescription = {
278    .sType = VK_STRUCTURE_TYPE_SUBPASS_DESCRIPTION_2_KHR,
279    .pNext = &depthStencilResolve,
280    .pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS,
281    .colorAttachmentCount = 2,
282    .pColorAttachments = colorAttachments,
283    .pDepthStencilAttachment = &depthStencilAttachment,
284};
285
286// The render pass creation.
287VkRenderPassCreateInfo2KHR renderPassInfo = {
288    .sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO_2_KHR,
289    .attachmentCount = 3,
290    .pAttachments = attachmentDescs,
291    .subpassCount = 1,
292    .pSubpasses = &subpassDescription,
293};
294
295VkRenderPass renderPass;
296vkCreateRenderPass2(device, &renderPassInfo, NULL, &renderPass);
297----
298
299A similar pass with `VK_KHR_dynamic_rendering`:
300
301[source,c]
302----
303VkRenderingAttachmentInfo colorAttachments[2] = {
304    // Assuming a single-sampled color attachment 0
305    {
306        .sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO
307        .imageView = ...,
308        .imageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL,
309        .resolveMode = VK_RESOLVE_MODE_AVERAGE_BIT,
310        .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
311        .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
312    },
313    // Assuming a multisampled color attachment 1 with 4x samples
314    {
315        .sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO
316        .imageView = ...,
317        .imageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL,
318        .resolveMode = VK_RESOLVE_MODE_AVERAGE_BIT,
319        .resolveImageView = ...,
320        .resolveImageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL,
321        .loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE,
322        .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
323    },
324};
325
326// Assuming a single-sampled depth/stencil attachment
327VkRenderingAttachmentInfo depthAttachment = {
328    .sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO
329    .imageView = ...,
330    .imageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL,
331    .resolveMode = VK_RESOLVE_MODE_SAMPLE_ZERO_BIT,
332    .loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR,
333    .storeOp = VK_ATTACHMENT_STORE_OP_STORE,
334    .clearValue = { ... },
335};
336VkRenderingAttachmentInfo stencilAttachment = {
337    .sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO
338    .imageView = ...,
339    .imageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL,
340    .resolveMode = VK_RESOLVE_MODE_NONE,
341    .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD,
342    .storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE,
343};
344
345// Multisampled-render-to-single-sampling info.  Rendering at 4xMSAA.
346VkMultisampledRenderToSingleSampledInfoEXT msrtss = {
347    .sType = VK_STRUCTURE_TYPE_MULTISAMPLED_RENDER_TO_SINGLE_SAMPLED_INFO_EXT,
348    .multisampledRenderToSingleSampledEnable = VK_TRUE,
349    .rasterizationSamples = 4,
350};
351
352VkRenderingInfo renderingInfo = {
353    .sType = VK_STRUCTURE_TYPE_RENDERING_INFO,
354    .pNext = &msrtss,
355    .renderArea = { ... },
356    .layerCount = 1,
357    .colorAttachmentCount = 2,
358    .pColorAttachments = colorAttachments,
359    .pDepthAttachment = &depthAttachment,
360    .pStencilAttachment = &stencilAttachment,
361};
362
363vkCmdBeginRendering(commandBuffer, &renderingInfo);
364----
365
366== Issues
367
368=== RESOLVED: What about `VK_KHR_dynamic_rendering`?
369
370Render passes remain the optimal solution for tiling GPUs.
371The current limitations of the `VK_KHR_dynamic_rendering` extension on tiling
372GPUs may improve over time, so this extension may be used with dynamic
373rendering.
374
375=== RESOLVED: Lack of on-tile-resolve support for some formats will particularly have a negative impact on this extension.  Can there be a format feature flag added?
376
377A specific struct is added to query performance of subpass resolve for each
378format.
379A format feature flag is avoided for two reasons; one is their scarcity, and
380the other is that normally format feature flags imply that the corresponding
381functionalities are not allowed if the flag is missing.
382In this case however, the implementation necessarily supports subpass resolves
383albeit inefficiently, so the lack of such a hypothetical format feature flag
384would not block their usage.
385