1// Copyright 2021-2023 The Khronos Group Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5# VK_EXT_graphics_pipeline_library 6:toc: left 7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/ 8:sectnums: 9 10This document outlines a proposal to allow partial compilation of portions 11of pipelines, improving the performance of pipeline compilation for 12applications that have large numbers of materials, large amounts of 13dynamic state, or continuously stream in new material definitions. 14 15 16## Problem Statement 17 18The original promise of monolithic pipelines in Vulkan was to enable 19developers to construct all their state up front, avoiding the driver doing 20dynamic compilation and patching shaders implicitly when recording draw 21calls, resulting in unexpected hitches. 22 23The reality however is that for many game engines, requiring most of this state up front 24either fails to eliminate hitching, 25or requires precompiling so many state combinations that the size of the 26pipeline cache is nearly unmanageable. 27 28Games engines are typically still managing enormous sets of state and 29shader combinations, and this is not a purely technical problem. 30It is still expected and encouraged that developers will limit the number 31of these, but it doesn’t change the fact that at least in the 32short-to-mid-term, developers are having real problems that can’t be solved by 33telling them to reduce the number of pipelines. 34 35This proposal does not aim to fully solve these issues, but instead provides 36a key piece of infrastructure required to solve it. 37The main aim of this proposal is to reduce the cost of loading novel state 38and shader combinations within the rendering loop, thus avoiding hitching. 39 40An additional constraint to be aware of is that any solution should not 41regress the intended wins from moving to pipeline objects – there should be 42no need for late-compilation or patching that is performed *implicitly* by 43the implementation. 44An expectation of any solution here is that GPU performance may suffer 45due to sub-optimal linking, and the solution should provide a way to mitigate this. 46Explicit late compilation or patching may be acceptable, but it should be 47simple to perform, and applications should have control over when and how 48it is done. 49 50 51## Solution Space 52 53The following options have been considered: 54 55 . Handle this inside the implementation 56 . Additional dynamic state 57 . Separately compiled pipeline/state blobs 58 59Handling this inside the implementation would potentially solve the problem 60for the class of apps that have this issue. 61However, it takes the choice of fast-linking vs. whole program optimization 62away from the application. 63It also means fighting with drivers and performance guidelines to hit the 64right usage to trigger it on each implementation. 65 66As for dynamic state, it is likely that the list of state that is fully dynamic 67across implementations has been all but exhausted at this point. 68While vendors can choose to expose additional dynamic state as they see 69fit, solving this problem portably needs a different solution. 70Vendors trying to implement state that isn’t dynamic as if it were dynamic 71will end up doing implicit work at command recording time, leading 72inevitably to implicit compilation or patching of shaders – which is 73undesirable. 74 75Separately compiling chunks of state (e.g. individual shaders, vertex 76inputs, render passes) allows for applications to individually compile 77these chunks as they show up. 78Enough information should be given in this early step that linking these 79chunks together later has significant cost savings and can be done at record time 80if necessary. 81Implementations could “cheat” at separate chunk compilation by exposing 82this extension by keeping the create information until the final link 83step and compiling everything at once then. 84In general it is desirable for implementations to avoid late compilation, but this 85does allow the extension to be implemented more widely (including via a software layer), 86providing better consistency for developers. 87Explicitly advertising this detail could allow developers to make better 88choices about how and when these pipelines are compiled. 89 90This proposal focuses on option 3 – providing applications with the ability 91to separately compile state chunks and later link them together. 92 93 94## Proposal 95 96 97### Prior Art: VK_EXT_pipeline_library 98 99For link:{refpage}VK_KHR_ray_tracing_pipeline.html[VK_KHR_ray_tracing_pipeline], pipelines 100contain a significant number of shaders - making monolithic compilation 101very slow. 102link:{refpage}VK_KHR_pipeline_library.html[VK_KHR_pipeline_library] allowed 103applications to create partial pipelines (pipeline libraries) containing 104only a subset of the final shaders. 105These pipeline libraries can be linked together to form a final executable. 106Ray pipelines were relatively straightforward as only shaders are linked, 107and there’s no “state” for ray shaders beyond the shader groups. 108 109Graphics pipelines by comparison contain a lot of static state that needs 110to be separated carefully, retaining any “interface” information. 111However, this extension reuses the same underlying mechanism. 112 113 114### Features 115 116The following feature is exposed by this extension: 117 118[source,c] 119---- 120typedef struct VkPhysicalDeviceGraphicsPipelineLibraryFeaturesEXT { 121 VkStructureType sType; 122 void* pNext; 123 VkBool32 graphicsPipelineLibrary; 124} VkPhysicalDeviceGraphicsPipelineLibraryFeaturesEXT; 125---- 126 127`graphicsPipelineLibrary` is the core feature enabling this 128functionality. 129 130 131### Properties 132 133The following properties are exposed by this extension: 134 135[source,c] 136---- 137typedef struct VkPhysicalDeviceGraphicsPipelineLibraryPropertiesEXT { 138 VkStructure sType; 139 void* pNext; 140 VkBool32 graphicsPipelineLibraryFastLinking; 141 VkBool32 graphicsPipelineLibraryIndependentInterpolationDecoration; 142} VkPhysicalDeviceGraphicsPipelineLibraryPropertiesEXT; 143---- 144 145`graphicsPipelineLibraryFastLinking` indicates whether the cost of 146linking pipelines without `VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT` 147is comparable to recording a command in a command buffer, such that 148applications can link pipelines on demand while recording commands. 149If this property is not supported, linking should still be cheaper than 150a full pipeline compilation. 151 152If `graphicsPipelineLibraryIndependentInterpolationDecoration` is not 153supported, applications must provide matching interpolation decorations in 154both the last geometry stage and the fragment stage; if it is supported, 155any geometry stage decorations are ignored. 156 157 158### Dividing up the graphics state 159 160Four sets of state that have been identified as often recombined by 161applications are: 162 163 * Vertex Input Interface 164 * Pre-rasterization 165 * Post-rasterization 166 * Fragment Output Interface (including blend state) 167 168The intent is to allow each of those to be independently compiled as far as 169possible, along with relevant pieces of state that may need to match for 170the final linked pipeline. 171 172[source,c] 173---- 174typedef struct VkGraphicsPipelineLibraryCreateInfoEXT { 175 VkStructureType sType; 176 void* pNext; 177 VkGraphicsPipelineLibraryFlagsEXT flags; 178} VkGraphicsPipelineLibraryCreateInfoEXT; 179 180typedef enum VkGraphicsPipelineLibraryFlagBitsEXT { 181 VK_GRAPHICS_PIPELINE_LIBRARY_VERTEX_INPUT_INTERFACE_BIT_EXT = 0x00000001, 182 VK_GRAPHICS_PIPELINE_LIBRARY_PRE_RASTERIZATION_SHADERS_BIT_EXT = 0x00000002, 183 VK_GRAPHICS_PIPELINE_LIBRARY_FRAGMENT_SHADER_BIT_EXT = 0x00000004, 184 VK_GRAPHICS_PIPELINE_LIBRARY_FRAGMENT_OUTPUT_INTERFACE_BIT_EXT = 0x00000008, 185} VkGraphicsPipelineLibraryFlagBitsEXT; 186 187typedef VkFlags VkGraphicsPipelineLibraryFlagsEXT; 188---- 189 190Pipeline libraries are created for the parts specified, and any parameters 191required to create a library with those parts must be provided. 192 193For all pipeline libraries 194link:{refpage}VkPipelineCache.html[VkPipelineCache], `basePipelineHandle`, 195`basePipelineIndex`, 196link:{refpage}VkPipelineCreationFeedbackCreateInfo.html[VkPipelineCreationFeedbackCreateInfo], 197and 198link:{refpage}VkPipelineCompilerControlCreateInfoAMD.html[VkPipelineCompilerControlCreateInfoAMD] 199parameters are independently consumed and do not need to match between 200libraries or for any final pipeline. 201link:{refpage}VkPipelineCreateFlags.html[VkPipelineCreateFlags] are also 202independent, though `VK_PIPELINE_CREATE_LIBRARY_BIT_KHR` is required for 203all pipeline libraries. 204Only dynamic states that affect state consumed by a library are used, 205other dynamic states are ignored and play no part in linked pipelines. 206Where multiple pipeline libraries are built with the same required piece of 207state, those states must match exactly when linked together. 208 209The subset of 210link:{refpage}VkGraphicsPipelineCreateInfo.html[VkGraphicsPipelineCreateInfo] 211used to compile each kind of pipeline library is listed in the following 212sections, along with any pitfalls, quirks, or interactions that need 213calling out. 214Any state not explicitly listed for a particular library part will be 215ignored when compiling that part. 216 217[NOTE] 218.Note 219==== 220There is no change to dynamic state, so if state can be made dynamic, it 221doesn’t need to be present when compiling a pipeline library part if it is 222specified as dynamic. 223==== 224 225[NOTE] 226.Note 227==== 228The following section is a complete list only at time of writing - see the 229specification for a more up-to-date list. 230==== 231 232#### Vertex Input Interface 233 234A vertex input interface library is defined by the following state: 235 236 * link:{refpage}VkPipelineVertexInputStateCreateInfo.html[VkPipelineVertexInputStateCreateInfo] 237 * link:{refpage}VkPipelineInputAssemblyStateCreateInfo.html[VkPipelineInputAssemblyStateCreateInfo] 238 239 240#### Pre-Rasterization Shaders 241 242A pre-rasterization shader library is defined by the following state: 243 244 * A valid link:{refpage}VkPipelineShaderStageCreateInfo.html[VkPipelineShaderStageCreateInfo] 245 for each pre-rasterization shader stage used 246 * Within the link:{refpage}VkPipelineLayout.html[VkPipelineLayout], all 247 descriptor sets with pre-rasterization shader bindings if 248 `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` was specified. 249 ** If `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` was not specified, 250 the full pipeline layout must be specified. 251 * link:{refpage}VkPipelineViewportStateCreateInfo.html[VkPipelineViewportStateCreateInfo] 252 ** However, all the functionality in that structure is dynamic other than 253 the flags, and this extension allows the structure to be omitted such 254 that it is as-if it was zero-initialized. 255 * link:{refpage}VkPipelineRasterizationStateCreateInfo.html[VkPipelineRasterizationStateCreateInfo] 256 * link:{refpage}VkPipelineTessellationStateCreateInfo.html[VkPipelineTessellationStateCreateInfo] 257 is required if tessellation stages are included. 258 * link:{refpage}VkRenderPass.html[VkRenderPass] and `subpass` parameter 259 * link:{refpage}VkPipelineRenderingCreateInfo.html[VkPipelineRenderingCreateInfo] for the `viewMask` parameter - formats are ignored. 260 * link:{refpage}VkPipelineDiscardRectangleStateCreateInfoEXT.html[VkPipelineDiscardRectangleStateCreateInfoEXT] 261 * link:{refpage}VkPipelineFragmentShadingRateStateCreateInfoKHR.html[VkPipelineFragmentShadingRateStateCreateInfoKHR] 262 263 264#### Fragment Shader 265 266A fragment shader library is defined by the following state: 267 268 * A valid link:{refpage}VkPipelineShaderStageCreateInfo.html[VkPipelineShaderStageCreateInfo] 269 for the fragment shader stage. 270 * Within the link:{refpage}VkPipelineLayout.html[VkPipelineLayout], all 271 descriptor sets with fragment shader bindings if 272 `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` was specified. 273 ** If `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` was not specified, 274 the full pipeline layout must be specified. 275 * link:{refpage}VkPipelineMultisampleStateCreateInfo.html[VkPipelineMultisampleStateCreateInfo] 276 if sample shading is enabled or `renderpass` is not `VK_NULL_HANDLE`. 277 * link:{refpage}VkPipelineDepthStencilStateCreateInfo.html[VkPipelineDepthStencilStateCreateInfo] 278 * link:{refpage}VkRenderPass.html[VkRenderPass] and `subpass` parameter 279 * link:{refpage}VkPipelineRenderingCreateInfo.html[VkPipelineRenderingCreateInfo] for the `viewMask` parameter - formats are ignored. 280 * link:{refpage}VkPipelineFragmentShadingRateStateCreateInfoKHR.html[VkPipelineFragmentShadingRateStateCreateInfoKHR] 281 * link:{refpage}VkPipelineFragmentShadingRateEnumStateCreateInfoNV.html[VkPipelineFragmentShadingRateEnumStateCreateInfoNV] 282 * link:{refpage}VkPipelineRepresentativeFragmentTestStateCreateInfoNV.html[VkPipelineRepresentativeFragmentTestStateCreateInfoNV] 283 * Inclusion/omission of the 284 `VK_PIPELINE_RASTERIZATION_STATE_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_KHR` 285 flag 286 * Inclusion/omission of the 287 `VK_PIPELINE_RASTERIZATION_STATE_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT` 288 flag 289 290 291#### Fragment Output Interface 292 293A fragment output interface library is defined by the following state: 294 295 * link:{refpage}VkPipelineColorBlendStateCreateInfo.html[VkPipelineColorBlendStateCreateInfo] 296 * link:{refpage}VkPipelineMultisampleStateCreateInfo.html[VkPipelineMultisampleStateCreateInfo] 297 * link:{refpage}VkRenderPass.html[VkRenderPass] and `subpass` parameter 298 * link:{refpage}VkPipelineRenderingCreateInfo.html[VkPipelineRenderingCreateInfo] 299 * link:{refpage}VkAttachmentSampleCountInfoAMD.html[VkAttachmentSampleCountInfoAMD/NV] 300 301 302#### Interactions with extensions 303 304The required structures for each pipeline subset include anything in the `pNext` 305chains of the listed structures; any extensions to these structures are thus 306implicitly accounted for unless otherwise stated. 307includes anything in the `pNext` chains of those structures, so any 308extensions that extend these structures will be automatically accounted for. 309If any extension allows parts of 310link:{refpage}VkGraphicsPipelineCreateInfo.html[VkGraphicsPipelineCreateInfo] 311to be ignored, by default that part of the state will also be ignored when 312using graphics pipeline libraries. 313Any extension that extends the base 314link:{refpage}VkGraphicsPipelineCreateInfo.html[VkGraphicsPipelineCreateInfo] 315directly, or otherwise differs from the above implicit interactions, will 316need an explicit interaction. 317 318### Pipeline Layouts 319 320To allow descriptor sets to be independently specified for each of the two shader library types, a new pipeline layout create flag is added: 321 322[source,c] 323---- 324typedef enum VkPipelineLayoutCreateFlagBits { 325 VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT = 0x00000002 326} VkPipelineLayoutCreateFlagBits; 327---- 328 329When specified, fragment and pre-rasterization shader pipeline libraries only need to specify the descriptor sets used by that library. 330Descriptor set layouts unused by a library may be set to `VK_NULL_HANDLE`. 331 332 333### Linking 334 335Linking is performed by including the existing 336link:{refpage}VkPipelineLibraryCreateInfoKHR.html[VkPipelineLibraryCreateInfoKHR] structure in the pNext chain of 337link:{refpage}VkGraphicsPipelineCreateInfo.html[VkGraphicsPipelineCreateInfo]. 338 339[source,c] 340---- 341typedef struct VkPipelineLibraryCreateInfoKHR { 342 VkStructureType sType; 343 const void* pNext; 344 uint32_t libraryCount; 345 const VkPipeline* pLibraries; 346} VkPipelineLibraryCreateInfoKHR; 347---- 348 349Libraries can be linked into other libraries recursively while there are 350still state blobs that can be linked together. 351E.g an application could create a library for the vertex input interface 352and pre-rasterization shaders separately, then link them into a new 353library. 354 355A newly created graphics pipeline consists of the parts defined by 356linked libraries, plus those defined by 357link:{refpage}VkGraphicsPipelineLibraryCreateInfoEXT.html[VkGraphicsPipelineLibraryCreateInfoEXT]. 358Parts specified in the pipeline must not overlap those defined by 359libraries, and similarly multiple libraries must not provide the same 360parts. 361Any state required by multiple parts must match. 362 363Graphics pipelines that contain a full set of libraries are executable, may 364not be used for further linking, and must not have the 365`VK_PIPELINE_CREATE_LIBRARY_BIT_KHR` set. 366Graphics pipelines that contain only a subset of stages are not executable, 367may be used for further linking, and must have 368`VK_PIPELINE_CREATE_LIBRARY_BIT_KHR` set. 369 370If `rasterizerDiscardEnable` is enabled, the complete set of parts does 371not include fragment shader or fragment output interface 372libraries. 373 374Two additional bits control how linking is performed: 375 376 * `VK_PIPELINE_CREATE_RETAIN_LINK_TIME_OPTIMIZATION_INFO_BIT_EXT` 377 * `VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT` 378 379`VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT` allows applications 380to specify that linking should perform an optimization pass; when this bit 381is specified, additional optimizations will be performed at link time, and 382the resulting pipeline should perform equivalently to a pipeline created 383monolithically. 384 385To perform link time optimizations, 386`VK_PIPELINE_CREATE_RETAIN_LINK_TIME_OPTIMIZATION_INFO_BIT_EXT` must be 387specified on all pipeline libraries that are being linked together. 388Implementations should retain any additional information needed to perform 389optimizations at the final link step when this bit is present. 390 391If the application created the final linked pipeline with pipeline layouts 392including the `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` flag, 393the final linked pipeline layout is the union of the layouts provided for 394shader stages. 395However, in the specific case that a final link is being performed between 396stages and `VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT` is specified, 397the application can override the pipeline layout with one that is compatible 398with that union but does not have the 399`VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` flag set, allowing a 400more optimal pipeline layout to be used when generating the final pipeline. 401 402 403### Deprecating shader modules 404 405To make single-shader compilation consistent, shader modules will be 406deprecated by allowing link:{refpage}VkShaderModuleCreateInfo.html[VkShaderModuleCreateInfo] to be chained to 407link:{refpage}VkPipelineShaderStageCreateInfo.html[VkPipelineShaderStageCreateInfo], and allowing the 408link:{refpage}VkShaderModule.html[VkShaderModule] to be link:{refpage}VK_NULL_HANDLE.html[VK_NULL_HANDLE] in this case. 409Applications can continue to use shader modules as they are not 410being removed; but it’s strongly recommended to not use them. 411The primary reason for this would be to allow bypassing what is in many 412cases a useless copy, along with potential wasted storage if they are 413retained. 414There have been previous efforts to allow shader modules to be precompiled 415in some way, but this functionality is now being made available in a more 416reliable and portably agreed way, negating the need to focus efforts in 417this area moving forward. 418 419 420## Examples 421 422 423### Compilation 424 425Initial compilation can now be organised into separate chunks, allowing 426consistent earlier compilation for applications that have this information 427available separately, and potentially allows more multithreading 428opportunities for applications that do not. 429 430Below is an example of the information needed to compile a vertex shader: 431 432[source,c] 433---- 434VkPipeline createVertexShader( 435 VkDevice device, 436 const uint32_t* pShader, 437 size_t shaderSize, 438 VkPipelineCache vertexShaderCache, 439 VkPipelineLayout layout) 440{ 441 VkShaderModuleCreateInfo shaderModuleCreateInfo{}; 442 shaderModuleCreateInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO; 443 shaderModuleCreateInfo.codeSize = shaderSize; 444 shaderModuleCreateInfo.pCode = pShader; 445 446 VkGraphicsPipelineLibraryCreateInfoEXT libraryInfo{}; 447 libraryInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_LIBRARY_CREATE_INFO_EXT; 448 libraryInfo.flags = VK_GRAPHICS_PIPELINE_LIBRARY_PRE_RASTERIZATION_SHADERS_BIT_EXT; 449 450 VkPipelineShaderStageCreateInfo stageCreateInfo{}; 451 stageCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO; 452 stageCreateInfo.pNext = &shaderModuleCreateInfo; 453 stageCreateInfo.stage = VK_SHADER_STAGE_VERTEX_BIT; 454 stageCreateInfo.pName = "main"; 455 456 VkDynamicState vertexDynamicStates[2] = { 457 VK_DYNAMIC_STATE_VIEWPORT_WITH_COUNT_EXT, 458 VK_DYNAMIC_STATE_SCISSOR_WITH_COUNT_EXT }; 459 460 VkPipelineDynamicStateCreateInfo dynamicInfo{}; 461 dynamicInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO; 462 dynamicInfo.dynamicStateCount = 2; 463 dynamicInfo.pDynamicStates = vertexDynamicStates; 464 465 VkGraphicsPipelineCreateInfo vertexShaderCreateInfo{}; 466 vertexShaderCreateInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO; 467 vertexShaderCreateInfo.pNext = &libraryInfo; 468 vertexShaderCreateInfo.flags = VK_PIPELINE_CREATE_LIBRARY_BIT_KHR | 469 VK_PIPELINE_CREATE_RETAIN_LINK_TIME_OPTIMIZATION_INFO_BIT_EXT; 470 vertexShaderCreateInfo.stageCount = 1; 471 vertexShaderCreateInfo.pStages = &stageCreateInfo; 472 vertexShaderCreateInfo.layout = layout; 473 vertexShaderCreateInfo.pDynamicState = &dynamicInfo; 474 475 VkPipeline vertexShader; 476 vkCreateGraphicsPipelines( 477 device, vertexShaderCache, 1, &vertexShaderCreateInfo, NULL, &vertexShader); 478 479 return vertexShader; 480} 481---- 482 483[NOTE] 484.Note 485==== 486This example makes use of 487link:{refpage}VK_KHR_dynamic_rendering.html[VK_KHR_dynamic_rendering] to 488avoid render pass interactions. 489If that extension is not available, a render pass object and the 490corresponding subpass will also need to be provided. 491==== 492 493### Linking 494 495Linking is relatively straightforward - pipeline libraries in, executable 496pipeline out, with the option of optimizing the pipeline or not. 497 498[source,c] 499---- 500VkPipeline linkExecutable( 501 VkDevice device, 502 VkPipeline* pLibraries, 503 size_t libraryCount, 504 VkPipelineCache executableCache, 505 bool optimized) 506{ 507 VkPipelineLibraryCreateInfoKHR linkingInfo{}; 508 linkingInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LIBRARY_CREATE_INFO_KHR; 509 linkingInfo.libraryCount = libraryCount; 510 linkingInfo.pLibraries = pLibraries; 511 512 VkGraphicsPipelineCreateInfo executablePipelineCreateInfo{}; 513 executablePipelineCreateInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO; 514 executablePipelineCreateInfo.pNext = &linkingInfo; 515 executablePipelineCreateInfo.flags |= optimized ? 516 VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT : 0; 517 518 VkPipeline executable = VK_NULL_HANDLE; 519 520 vkCreateGraphicsPipelines( 521 device, executableCache, 1, & executablePipelineCreateInfo, NULL, &executable); 522 523 return executable; 524} 525---- 526 527[NOTE] 528.Note 529==== 530The behavior of the pipeline cache in this scenario is subject to specific 531behavior depending on implementation properties and whether fast or 532optimized linking is being used. 533This is spelled out in the spec, but summarised briefly again here: 534 535If fast linking is being performed, the implementation should only lookup 536into the cache if it is expected that will be faster than linking. 537If linking is faster, then the cache lookup and any writes to the cache should be skipped. 538The aim of this is to ensure that fast linking is always as fast as 539possible. 540If a cache lookup is performed, optimized pipelines in the cache should be 541returned preferentially to any fast-linked pipelines. 542 543If optimized linking is being performed, the implementation should not 544generate a hit on a suboptimal fast linked pipeline, instead creating a new 545pipeline and corresponding cache entry. 546==== 547 548 549## Issues 550 551 552### RESOLVED: Should the pre-rasterization stages be separated? 553 554While splitting the geometry stages may be possible, it’s a significant 555amount of additional work for many vendors, the advantage for most 556developers is unclear, and it would be difficult to make some of the 557guarantees in this extension. 558 559 560### RESOLVED: What is the expected usage model? 561 562When a novel shader/stage combination is seen that requires compilation, it 563should be compiled into a separate pipeline library as early as possible; 564this should be possible alongside usual material/object loading 565(e.g. texture/mesh streaming). 566If an application has its own material cache, the library should be cached 567there. 568Applications should still use pipeline caches to amortize compilation 569across similar stage blobs but should avoid mixing different stage types in 570the same link:{refpage}VkPipelineCache.html[VkPipelineCache], to avoid unnecessary lookup overhead. 571 572Basic linking should then be done as early as the application is able. 573Applications should ideally store/cache this pipeline with relevant objects. 574Using a link:{refpage}VkPipelineCache.html[VkPipelineCache] for this suboptimal pipeline is recommended; 575implementations where this would provide no benefit should ignore the cache 576lookup request for fast linking. 577 578Once a basic link is done, the application should schedule a task for a 579separate thread to create an optimized pipeline. 580This should use pipeline caches in the same manner as existing monolithic 581compilation, sharing this cache with fast-linked pipelines. 582Implementations should prefer returning optimized pipelines from these 583caches. 584Applications should switch to the optimized pipeline as soon as they are 585available. 586 587 588### RESOLVED: Why is there suggested behavior for the implementation of pipeline caches instead of letting the caching be driven by the application? 589 590Work to change the way pipelines are cached is ongoing; to avoid scope creep 591the minimum set of features required to ensure things worked were added. 592A future extension may change how a lot of this works, so it was undesirable to 593design something that would be thrown away later. 594 595 596### RESOLVED: What are the downsides to using unoptimized pipelines? 597 598A fast-linked pipeline may have a significant device performance penalty 599compared to the final pipeline on some implementations. 600Some vendors may have a negligible performance penalty; others will have 601performance penalties differing based on which shader stages are compiled 602together. 603A rough estimate given by vendors is that it could be as bad as a 50% 604penalty in the general case, with outliers performing even worse. 605 606As general advice, applications should be aiming to keep the amount of work 607in each frame performed by unoptimized pipelines relatively low (<10%); 608profiling may be necessary to identify problematic areas. 609Developers are strongly encouraged create optimized pipelines as soon as 610they are able to replace the linked pipeline. 611Relying completely on fast linked pipelines could result in unacceptable 612performance degradation on some implementations. 613 614 615### RESOLVED: Are there any interactions with specialization constants? 616 617No. This extension doesn’t change how specialization constants work – they 618work as they do for existing pipelines. 619If they’re provided, implementations are free to specialize the pipeline or 620not, and cache pipelines that are specialized, unspecialized, or both. 621Specialization constants must be provided alongside the shader stages using 622them and cannot be provided at link time. 623This may be something we want to address in a future extension. 624 625 626### RESOLVED: Are there any interface matching requirements that will need to change, like SSOs in OpenGL/ES? 627 628Some implementations require the interpolation decorations in the last 629geometry shader stage if pipeline libraries are used, and this is 630advertised by the 631`graphicsPipelineLibraryIndependentInterpolationDecoration` property. 632It is expected that these implementations are serving markets where OpenGL 633ES is dominant, where this requirement was never dropped for separate 634shader objects, unlike OpenGL. 635 636 637### RESOLVED: Should we allow passing SPIR-V directly into pipeline creation? 638 639Yes. This simplifies compilation, avoids an unnecessary copy, and brings developers and implementations onto the same page. 640 641 642### RESOLVED: Should we advertise a property for “free link” vs. “fast link”? 643 644Yes, as developers may want to adjust the way they manage pipelines. 645If linking is more or less free, the expectation is that applications may 646link pipelines on demand when recording draw calls. 647If linking is going to take more time, they may try to more aggressively 648pre-cache pipelines. 649This has been added as the `graphicsPipelineLibraryFastLinking` 650property. 651 652Implementation and developer guidance is that if this feature bit is 653advertised, applications should be able to link on demand, so the cost of 654linking should be comparable to recording commands in a command buffer. 655 656 657### RESOLVED: Does anyone need the depth/stencil format to be provided with the depth bias state? 658 659The depth format affects how depth bias is applied, but these are currently provided in separate parts of the pipeline. 660Nobody has claimed this to be a problem. 661 662 663### RESOLVED: With the recommendation to create an optimized pipeline as well as a fast linked pipeline, will this lead to additional memory consumption? 664 665Caches containing pipeline libraries will necessarily increase the total memory consumption of compiled pipelines, as applications will generally try to keep these available while pipeline could be streamed in/out. 666Implementations may be able to use data in the library caches for the final pipelines in some circumstances, which could help mitigate it - but this is not guaranteed and will vary by vendor. 667 668Fast-linked pipelines should not contribute to the total memory consumption if applications destroy the fast-linked pipeline once an optimized version exists. 669 670Improvements to pipeline caches allowing selective eviction of individual caches could help with memory management here, but as this intersects with other known pipeline cache problems, this should be dealt with in a separate extension. 671 672 673### RESOLVED: Should the link time optimization bits apply to other pipeline libraries (e.g. ray tracing)? 674 675Yes, but it will not necessarily be subject to the same quality guarantees. Ray tracing pipeline libraries were not designed with this directly in mind, so while implementations should make use of these bits as best they can, it is not possible to make the same quality guarantees as for graphics pipelines. 676 677Any future extensions using the pipeline library interface should be aware of these interactions and try to follow the intent of these bits as much as possible. 678 679 680### RESOLVED: Does the shader module deprecation apply to other pipelines? 681 682Yes. 683 684 685### RESOLVED: Should `VkPipelineDepthStencilStateCreateInfo` be part of the fragment shader state or fragment output interface state? 686 687Some vendors will make use of this information if it is available and would rather not see it move - however notably all of this state can be made dynamic. Applications wanting to avoid setting this state with the fragment shader library should use this dynamic state. 688 689 690### RESOLVED: Should `VkPipelineMultisampleStateCreateInfo` be required, even when the fragment shader does not make use of multisampling shader features? 691 692The fragment shader library only needs this information when sample shading is enabled. 693 694 695### RESOLVED: Should `VkPipelineMultisampleStateCreateInfo` be part of the fragment output interface instead? 696 697Moving it to the output interface removes the need to create multiple fragment shader libraries for different MSAA rates, which some applications do as a part of dynamic performance tuning. 698This only works when used in conjunction with dynamic rendering; when render pass objects are used, the sample rate will effectively be sourced from any subpass attachments due to validation constraints. 699This could be made to work with subpasses with no attachments, but the additional complexity of adding that path had no clear benefit, so is disallowed. 700 701 702### RESOLVED: Should we add an explicit result code to pipeline creation to indicate that a pipeline compiled with link time optimization has been returned? 703 704No, as the complexity of handling this does not clearly translate to significant application wins. 705 706 707### RESOLVED: Can pipeline caches return optimized pipelines without the `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` set? 708 709Not unconditionally - if an implementation does not do anything with that flag then yes, but there is a functional difference then it cannot. I.e. the same as other pipeline state. 710 711