1// Copyright 2021-2023 The Khronos Group Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5# VK_EXT_multisampled_render_to_single_sampled 6:toc: left 7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/ 8:sectnums: 9 10This document identifies difficulties with efficient multisampled rendering on 11tiling GPUs and proposes an extension to improve it. 12 13## Problem Statement 14 15With careful usage of resolve attachments, multisampled image memory allocated 16with `VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT`, `loadOp` not equal to 17`VK_ATTACHMENT_LOAD_OP_LOAD`, and `storeOp` not equal to 18`VK_ATTACHMENT_STORE_OP_STORE`, a Vulkan application is able to efficiently 19perform multisampled rendering without incurring any additional memory penalty 20on tiling GPUs in most cases. 21 22On some tiling GPUs, subpass resolve operations for some formats cannot be done 23on the tile, and so additional performance and memory cost is silently paid 24similarly to performing the resolve through 25link:{refpage}vkCmdResolveImage.html[`vkCmdResolveImage`] after the subpass, 26with no feedback to the application. 27 28Additionally, under certain circumstances, the application may not be able to 29complete its multisampled rendering within a single render pass; for example if 30it does partial rasterization from frame to frame, blending on an image from a 31previous frame, or in emulation of `GL_EXT_multisampled_render_to_texture`. 32In such cases, the application can use an initial subpass to effectively load 33single sampled data from the next subpass's resolve attachment and fill in the 34multisampled attachment which otherwise uses `loadOp` equal to 35`VK_ATTACHMENT_LOAD_OP_DONT_CARE`. 36However, this is not always possible (for example for stencil in the absence of 37`VK_EXT_shader_stencil_export`) and has multiple drawbacks. 38 39Some implementations are able to perform said operation efficiently in 40hardware, effectively loading a multisampled attachment from the contents of a 41single sampled one. 42Together with the ability to perform a resolve operation at the end of a 43subpass, these implementations are able to perform multisampled rendering on 44single-sampled attachments with no extra memory or bandwidth overhead. 45 46This document proposes an extension that exposes this capability by allowing a 47framebuffer and render pass to include single-sampled attachments while 48rendering is done with a specified number of samples. 49 50## Proposal 51 52The extension first allows a framebuffer to contain a mixture of single-sampled 53and multisampled attachments. 54In the absence of `VkMultisampledRenderToSingleSampledInfoEXT`, a render pass 55subpass which performs multisampled rendering with `N` samples would still 56require all the attachments used in the subpass to have `N` samples. 57Similarly with `VK_EXT_dynamic_rendering`, the attachments can be a mixture of 58single-sampled and multisampled if `VkMultisampledRenderToSingleSampledInfoEXT` 59is present. 60 61In the following, a _pass_ refers to either a render pass subpass, or a 62`VK_EXT_dynamic_rendering` render pass. 63 64When `VkMultisampledRenderToSingleSampledInfoEXT` is provided, specifying that 65rendering is done with `N` samples, then any attachment used in the pass may 66either have one or `N` samples. 67In that case, attachments with one sample will automatically load as 68multisampled for the duration of the pass (where every pixel's value is 69replicated in all samples of that pixel on tile memory) and will automatically 70resolve at the end of the pass. 71This document refers to such single-sampled attachments as 72multisampled-render-to-single-sampled attachments. 73 74Additionally, this extension provides a means to the application to determine 75whether usage of a format for attachments will be detrimental to performance 76during a pass resolve operation, which can particularly adversely affect 77multisampled-render-to-single-sampled passes. 78 79Introduced by this API are: 80 81Feature, advertising whether the implementation supports 82multisampled-rendering-to-single-sampled: 83 84[source,c] 85---- 86typedef struct VkPhysicalDeviceMultisampledRenderToSingleSampledFeaturesEXT { 87 VkStructureType sType; 88 void* pNext; 89 VkBool32 multisampledRenderToSingleSampled; 90} VkPhysicalDeviceMultisampledRenderToSingleSampledFeaturesEXT; 91---- 92 93Performance query specifying whether usage of an attachment that is resolved at 94the end of a pass with a format will be optimal on hardware: 95 96[source,c] 97---- 98typedef struct VkSubpassResolvePerformanceQueryEXT { 99 VkStructureType sType; 100 void* pNext; 101 VkBool32 optimal; 102} VkSubpassResolvePerformanceQueryEXT; 103---- 104 105Specifying that a pass should perform multisampled-rendering-to-single-sampled 106with `N` sample counts (extending `VkSubpassDescription2` and 107 `VkRenderingInfo`): 108 109[source,c] 110---- 111typedef struct VkMultisampledRenderToSingleSampledInfoEXT { 112 VkStructureType sType; 113 void* pNext; 114 VkBool32 multisampledRenderToSingleSampledEnable; 115 VkSampleCountFlagBits rasterizationSamples; 116} VkMultisampledRenderToSingleSampledInfoEXT; 117---- 118 119An image creation flag to indicate the intention of using a single-sampled 120image in a multisampled-render-to-single-sampled pass: 121 122[source,c] 123---- 124VK_IMAGE_CREATE_MULTISAMPLED_RENDER_TO_SINGLE_SAMPLED_BIT_EXT 125---- 126 127In a multisampled-render-to-single-sampled pass with `N` samples, all rendering 128is done with `N` samples as if any single-sampled attachments truly had `N` 129samples. 130This means that 131link:{refpage}VkPipelineMultisampleStateCreateInfo.html[`VkPipelineMultisampleStateCreateInfo::rasterizationSamples`] 132would have to be `N`, and rasterization is done identically to Vulkan's 133multisampling rules for passes not using this extension. 134As such, the functionality in this extension purely affects the load and store 135of single-sampled attachments and their automatic representation as 136multisampled for the duration of the pass. 137 138Regardless of which load and store ops are used, the single-sampled attachments 139in a multisampled-render-to-single-sampled passes are represented as 140multisampled. 141The different load and store ops behave identically to the case where 142multisampled attachments are used. 143The following clarifies the ops in combination with 144multisampled-render-to-single-sampled attachments: 145 146- `VK_ATTACHMENT_LOAD_OP_LOAD`: For each pixel, its value is replicated in all 147 the `N` corresponding samples at the start of the pass. 148- `VK_ATTACHMENT_LOAD_OP_CLEAR`: The multisampled representation of the 149 attachment is cleared, not the single-sampled attachment. 150- `VK_ATTACHMENT_LOAD_OP_DONT_CARE`: Specifies that the previous contents of 151 the single-sampled attachment need not be preserved, and the contents of the 152 multisampled representation of the attachment will be undefined. 153- `VK_ATTACHMENT_LOAD_OP_NONE_EXT`: Specifies that the previous contents of the 154 single-sampled attachment will be preserved, but the contents of the 155 multisampled representation of the attachment will be undefined. 156 157- `VK_ATTACHMENT_STORE_OP_STORE`: The result of rendering is automatically 158 resolved into the single-sampled attachment at the end of the pass and 159 multisampled data is discarded. 160 With render passes, if a subpass follows that reads from the attachment as a 161 multisampled-render-to-single-sampled input attachment, it is undefined 162 whether the previous subpass's multisampled data are returned or the resolved 163 values. 164- `VK_ATTACHMENT_STORE_OP_DONT_CARE`: Specifies that the multisampled contents 165 are not needed after rendering, and may be discarded. 166 The contents of the single-sampled attachment will be undefined. 167- `VK_ATTACHMENT_STORE_OP_NONE_KHR`: Specifies that the contents of the 168 single-sampled attachment is not accessed by the store operation, but will be 169 undefined if the attachment was written to during the pass. 170 171While this extension adds a query for the resolve performance of attachments 172with a format, the results are not limited to 173multisampled-render-to-single-sampled passes, and are also applicable to passes 174with separate multisampled and single-sampled attachments with a resolve 175operation. 176 177== Examples 178 179To determine whether a format is suitable for use as a 180multisampled-render-to-single-sampled attachment for optimal performance: 181 182[source,c] 183---- 184VkSubpassResolvePerformanceQueryEXT perfQuery = { 185 .sType = VK_STRUCTURE_TYPE_SUBPASS_RESOLVE_PERFORMANCE_QUERY_EXT, 186}; 187 188VkFormatProperties2 formatProperties = { 189 .sType = VK_STRUCTURE_TYPE_FORMAT_PROPERTIES_2; 190 .pNext = &perfQuery; 191}; 192 193vkGetPhysicalDeviceFormatProperties2(device, format, &formatProperties); 194---- 195 196To create a render pass with a multisampled-render-to-single-sampled subpass 197with 4 samples: 198 199[source,c] 200---- 201// Render pass attachments with mixed sample count 202VkAttachmentDescription2 attachmentDescs[3] = { 203 [0] = { 204 .sType = VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2_KHR, 205 .format = ..., 206 .samples = 1, 207 .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD, 208 .storeOp = VK_ATTACHMENT_STORE_OP_STORE, 209 .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE, 210 .stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE, 211 .initialLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, 212 .finalLayout = VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, 213 }, 214 [1] = { 215 .sType = VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2_KHR, 216 .format = ..., 217 .samples = 4, 218 .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD, 219 .storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE, 220 .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE, 221 .stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE, 222 .initialLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, 223 .finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, 224 }, 225 [2] = { 226 .sType = VK_STRUCTURE_TYPE_ATTACHMENT_DESCRIPTION_2_KHR, 227 .format = ..., 228 .samples = 1, 229 .loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE, 230 .storeOp = VK_ATTACHMENT_STORE_OP_STORE, 231 .stencilLoadOp = VK_ATTACHMENT_LOAD_OP_LOAD, 232 .stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE, 233 .initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, 234 .finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, 235 }, 236}; 237 238// Subpass attachment references 239VkAttachmentReference2 colorAttachments[2] = { 240 [0] = { 241 .sType = VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2, 242 .attachment = 0, 243 .layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, 244 .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT, 245 }, 246 [1] = { 247 .sType = VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2, 248 .attachment = 1, 249 .layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, 250 .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT, 251 }, 252}; 253 254VkAttachmentReference2 depthStencilAttachment = { 255 .sType = VK_STRUCTURE_TYPE_ATTACHMENT_REFERENCE_2, 256 .attachment = 0, 257 .layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, 258 .aspectMask = VK_IMAGE_ASPECT_DEPTH_BIT | VK_IMAGE_ASPECT_STENCIL_BIT, 259}; 260 261// Multisampled-render-to-single-sampling info. Rendering at 4xMSAA. 262VkMultisampledRenderToSingleSampledInfoEXT msrtss = { 263 .sType = VK_STRUCTURE_TYPE_MULTISAMPLED_RENDER_TO_SINGLE_SAMPLED_INFO_EXT, 264 .multisampledRenderToSingleSampledEnable = VK_TRUE, 265 .rasterizationSamples = 4, 266}; 267 268// Resolve modes for depth/stencil 269VkSubpassDescriptionDepthStencilResolve depthStencilResolve = { 270 .sType = VK_STRUCTURE_TYPE_SUBPASS_DESCRIPTION_DEPTH_STENCIL_RESOLVE, 271 .pNext = &msrtss, 272 .depthResolveMode = VK_RESOLVE_MODE_SAMPLE_ZERO_BIT, 273 .stencilResolveMode = VK_RESOLVE_MODE_NONE, 274}; 275 276// The subpass description where multisampled-render-to-single-sampled rendering is enabled. 277VkSubpassDescription2 subpassDescription = { 278 .sType = VK_STRUCTURE_TYPE_SUBPASS_DESCRIPTION_2_KHR, 279 .pNext = &depthStencilResolve, 280 .pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS, 281 .colorAttachmentCount = 2, 282 .pColorAttachments = colorAttachments, 283 .pDepthStencilAttachment = &depthStencilAttachment, 284}; 285 286// The render pass creation. 287VkRenderPassCreateInfo2KHR renderPassInfo = { 288 .sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO_2_KHR, 289 .attachmentCount = 3, 290 .pAttachments = attachmentDescs, 291 .subpassCount = 1, 292 .pSubpasses = &subpassDescription, 293}; 294 295VkRenderPass renderPass; 296vkCreateRenderPass2(device, &renderPassInfo, NULL, &renderPass); 297---- 298 299A similar pass with `VK_KHR_dynamic_rendering`: 300 301[source,c] 302---- 303VkRenderingAttachmentInfo colorAttachments[2] = { 304 // Assuming a single-sampled color attachment 0 305 { 306 .sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO 307 .imageView = ..., 308 .imageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL, 309 .resolveMode = VK_RESOLVE_MODE_AVERAGE_BIT, 310 .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD, 311 .storeOp = VK_ATTACHMENT_STORE_OP_STORE, 312 }, 313 // Assuming a multisampled color attachment 1 with 4x samples 314 { 315 .sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO 316 .imageView = ..., 317 .imageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL, 318 .resolveMode = VK_RESOLVE_MODE_AVERAGE_BIT, 319 .resolveImageView = ..., 320 .resolveImageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL, 321 .loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE, 322 .storeOp = VK_ATTACHMENT_STORE_OP_STORE, 323 }, 324}; 325 326// Assuming a single-sampled depth/stencil attachment 327VkRenderingAttachmentInfo depthAttachment = { 328 .sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO 329 .imageView = ..., 330 .imageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL, 331 .resolveMode = VK_RESOLVE_MODE_SAMPLE_ZERO_BIT, 332 .loadOp = VK_ATTACHMENT_LOAD_OP_CLEAR, 333 .storeOp = VK_ATTACHMENT_STORE_OP_STORE, 334 .clearValue = { ... }, 335}; 336VkRenderingAttachmentInfo stencilAttachment = { 337 .sType = VK_STRUCTURE_TYPE_RENDERING_ATTACHMENT_INFO 338 .imageView = ..., 339 .imageLayout = VK_IMAGE_LAYOUT_ATTACHMENT_OPTIMAL, 340 .resolveMode = VK_RESOLVE_MODE_NONE, 341 .loadOp = VK_ATTACHMENT_LOAD_OP_LOAD, 342 .storeOp = VK_ATTACHMENT_STORE_OP_DONT_CARE, 343}; 344 345// Multisampled-render-to-single-sampling info. Rendering at 4xMSAA. 346VkMultisampledRenderToSingleSampledInfoEXT msrtss = { 347 .sType = VK_STRUCTURE_TYPE_MULTISAMPLED_RENDER_TO_SINGLE_SAMPLED_INFO_EXT, 348 .multisampledRenderToSingleSampledEnable = VK_TRUE, 349 .rasterizationSamples = 4, 350}; 351 352VkRenderingInfo renderingInfo = { 353 .sType = VK_STRUCTURE_TYPE_RENDERING_INFO, 354 .pNext = &msrtss, 355 .renderArea = { ... }, 356 .layerCount = 1, 357 .colorAttachmentCount = 2, 358 .pColorAttachments = colorAttachments, 359 .pDepthAttachment = &depthAttachment, 360 .pStencilAttachment = &stencilAttachment, 361}; 362 363vkCmdBeginRendering(commandBuffer, &renderingInfo); 364---- 365 366== Issues 367 368=== RESOLVED: What about `VK_KHR_dynamic_rendering`? 369 370Render passes remain the optimal solution for tiling GPUs. 371The current limitations of the `VK_KHR_dynamic_rendering` extension on tiling 372GPUs may improve over time, so this extension may be used with dynamic 373rendering. 374 375=== RESOLVED: Lack of on-tile-resolve support for some formats will particularly have a negative impact on this extension. Can there be a format feature flag added? 376 377A specific struct is added to query performance of subpass resolve for each 378format. 379A format feature flag is avoided for two reasons; one is their scarcity, and 380the other is that normally format feature flags imply that the corresponding 381functionalities are not allowed if the flag is missing. 382In this case however, the implementation necessarily supports subpass resolves 383albeit inefficiently, so the lack of such a hypothetical format feature flag 384would not block their usage. 385