1// Copyright 2021-2023 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5# VK_EXT_graphics_pipeline_library
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10This document outlines a proposal to allow partial compilation of portions
11of pipelines, improving the performance of pipeline compilation for
12applications that have large numbers of materials, large amounts of
13dynamic state, or continuously stream in new material definitions.
14
15
16## Problem Statement
17
18The original promise of monolithic pipelines in Vulkan was to enable
19developers to construct all their state up front, avoiding the driver doing
20dynamic compilation and patching shaders implicitly when recording draw
21calls, resulting in unexpected hitches.
22
23The reality however is that for many game engines, requiring most of this state up front
24either fails to eliminate hitching,
25or requires precompiling so many state combinations that the size of the
26pipeline cache is nearly unmanageable.
27
28Games engines are typically still managing enormous sets of state and
29shader combinations, and this is not a purely technical problem.
30It is still expected and encouraged that developers will limit the number
31of these, but it doesn’t change the fact that at least in the
32short-to-mid-term, developers are having real problems that can’t be solved by
33telling them to reduce the number of pipelines.
34
35This proposal does not aim to fully solve these issues, but instead provides
36a key piece of infrastructure required to solve it.
37The main aim of this proposal is to reduce the cost of loading novel state
38and shader combinations within the rendering loop, thus avoiding hitching.
39
40An additional constraint to be aware of is that any solution should not
41regress the intended wins from moving to pipeline objects – there should be
42no need for late-compilation or patching that is performed *implicitly* by
43the implementation.
44An expectation of any solution here is that GPU performance may suffer
45due to sub-optimal linking, and the solution should provide a way to mitigate this.
46Explicit late compilation or patching may be acceptable, but it should be
47simple to perform, and applications should have control over when and how
48it is done.
49
50
51## Solution Space
52
53The following options have been considered:
54
55  . Handle this inside the implementation
56  . Additional dynamic state
57  . Separately compiled pipeline/state blobs
58
59Handling this inside the implementation would potentially solve the problem
60for the class of apps that have this issue.
61However, it takes the choice of fast-linking vs. whole program optimization
62away from the application.
63It also means fighting with drivers and performance guidelines to hit the
64right usage to trigger it on each implementation.
65
66As for dynamic state, it is likely that the list of state that is fully dynamic
67across implementations has been all but exhausted at this point.
68While vendors can choose to expose additional dynamic state as they see
69fit, solving this problem portably needs a different solution.
70Vendors trying to implement state that isn’t dynamic as if it were dynamic
71will end up doing implicit work at command recording time, leading
72inevitably to implicit compilation or patching of shaders – which is
73undesirable.
74
75Separately compiling chunks of state (e.g. individual shaders, vertex
76inputs, render passes) allows for applications to individually compile
77these chunks as they show up.
78Enough information should be given in this early step that linking these
79chunks together later has significant cost savings and can be done at record time
80if necessary.
81Implementations could “cheat” at separate chunk compilation by exposing
82this extension by keeping the create information until the final link
83step and compiling everything at once then.
84In general it is desirable for implementations to avoid late compilation, but this
85does allow the extension to be implemented more widely (including via a software layer),
86providing better consistency for developers.
87Explicitly advertising this detail could allow developers to make better
88choices about how and when these pipelines are compiled.
89
90This proposal focuses on option 3 – providing applications with the ability
91to separately compile state chunks and later link them together.
92
93
94## Proposal
95
96
97### Prior Art: VK_EXT_pipeline_library
98
99For link:{refpage}VK_KHR_ray_tracing_pipeline.html[VK_KHR_ray_tracing_pipeline], pipelines
100contain a significant number of shaders - making monolithic compilation
101very slow.
102link:{refpage}VK_KHR_pipeline_library.html[VK_KHR_pipeline_library] allowed
103applications to create partial pipelines (pipeline libraries) containing
104only a subset of the final shaders.
105These pipeline libraries can be linked together to form a final executable.
106Ray pipelines were relatively straightforward as only shaders are linked,
107and there’s no “state” for ray shaders beyond the shader groups.
108
109Graphics pipelines by comparison contain a lot of static state that needs
110to be separated carefully, retaining any “interface” information.
111However, this extension reuses the same underlying mechanism.
112
113
114### Features
115
116The following feature is exposed by this extension:
117
118[source,c]
119----
120typedef struct VkPhysicalDeviceGraphicsPipelineLibraryFeaturesEXT {
121    VkStructureType    sType;
122    void*              pNext;
123    VkBool32           graphicsPipelineLibrary;
124} VkPhysicalDeviceGraphicsPipelineLibraryFeaturesEXT;
125----
126
127`graphicsPipelineLibrary` is the core feature enabling this
128functionality.
129
130
131### Properties
132
133The following properties are exposed by this extension:
134
135[source,c]
136----
137typedef struct VkPhysicalDeviceGraphicsPipelineLibraryPropertiesEXT {
138    VkStructure sType;
139    void*       pNext;
140    VkBool32    graphicsPipelineLibraryFastLinking;
141    VkBool32    graphicsPipelineLibraryIndependentInterpolationDecoration;
142} VkPhysicalDeviceGraphicsPipelineLibraryPropertiesEXT;
143----
144
145`graphicsPipelineLibraryFastLinking` indicates whether the cost of
146linking pipelines without `VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT`
147is comparable to recording a command in a command buffer, such that
148applications can link pipelines on demand while recording commands.
149If this property is not supported, linking should still be cheaper than
150a full pipeline compilation.
151
152If `graphicsPipelineLibraryIndependentInterpolationDecoration` is not
153supported, applications must provide matching interpolation decorations in
154both the last geometry stage and the fragment stage; if it is supported,
155any geometry stage decorations are ignored.
156
157
158### Dividing up the graphics state
159
160Four sets of state that have been identified as often recombined by
161applications are:
162
163  * Vertex Input Interface
164  * Pre-rasterization
165  * Post-rasterization
166  * Fragment Output Interface (including blend state)
167
168The intent is to allow each of those to be independently compiled as far as
169possible, along with relevant pieces of state that may need to match for
170the final linked pipeline.
171
172[source,c]
173----
174typedef struct VkGraphicsPipelineLibraryCreateInfoEXT {
175    VkStructureType                      sType;
176    void*                                pNext;
177    VkGraphicsPipelineLibraryFlagsEXT    flags;
178} VkGraphicsPipelineLibraryCreateInfoEXT;
179
180typedef enum VkGraphicsPipelineLibraryFlagBitsEXT {
181    VK_GRAPHICS_PIPELINE_LIBRARY_VERTEX_INPUT_INTERFACE_BIT_EXT = 0x00000001,
182    VK_GRAPHICS_PIPELINE_LIBRARY_PRE_RASTERIZATION_SHADERS_BIT_EXT = 0x00000002,
183    VK_GRAPHICS_PIPELINE_LIBRARY_FRAGMENT_SHADER_BIT_EXT = 0x00000004,
184    VK_GRAPHICS_PIPELINE_LIBRARY_FRAGMENT_OUTPUT_INTERFACE_BIT_EXT = 0x00000008,
185} VkGraphicsPipelineLibraryFlagBitsEXT;
186
187typedef VkFlags VkGraphicsPipelineLibraryFlagsEXT;
188----
189
190Pipeline libraries are created for the parts specified, and any parameters
191required to create a library with those parts must be provided.
192
193For all pipeline libraries
194link:{refpage}VkPipelineCache.html[VkPipelineCache], `basePipelineHandle`,
195`basePipelineIndex`,
196link:{refpage}VkPipelineCreationFeedbackCreateInfo.html[VkPipelineCreationFeedbackCreateInfo],
197and
198link:{refpage}VkPipelineCompilerControlCreateInfoAMD.html[VkPipelineCompilerControlCreateInfoAMD]
199parameters are independently consumed and do not need to match between
200libraries or for any final pipeline.
201link:{refpage}VkPipelineCreateFlags.html[VkPipelineCreateFlags] are also
202independent, though `VK_PIPELINE_CREATE_LIBRARY_BIT_KHR` is required for
203all pipeline libraries.
204Only dynamic states that affect state consumed by a library are used,
205other dynamic states are ignored and play no part in linked pipelines.
206Where multiple pipeline libraries are built with the same required piece of
207state, those states must match exactly when linked together.
208
209The subset of
210link:{refpage}VkGraphicsPipelineCreateInfo.html[VkGraphicsPipelineCreateInfo]
211used to compile each kind of pipeline library is listed in the following
212sections, along with any pitfalls, quirks, or interactions that need
213calling out.
214Any state not explicitly listed for a particular library part will be
215ignored when compiling that part.
216
217[NOTE]
218.Note
219====
220There is no change to dynamic state, so if state can be made dynamic, it
221doesn’t need to be present when compiling a pipeline library part if it is
222specified as dynamic.
223====
224
225[NOTE]
226.Note
227====
228The following section is a complete list only at time of writing - see the
229specification for a more up-to-date list.
230====
231
232#### Vertex Input Interface
233
234A vertex input interface library is defined by the following state:
235
236  * link:{refpage}VkPipelineVertexInputStateCreateInfo.html[VkPipelineVertexInputStateCreateInfo]
237  * link:{refpage}VkPipelineInputAssemblyStateCreateInfo.html[VkPipelineInputAssemblyStateCreateInfo]
238
239
240#### Pre-Rasterization Shaders
241
242A pre-rasterization shader library is defined by the following state:
243
244  * A valid link:{refpage}VkPipelineShaderStageCreateInfo.html[VkPipelineShaderStageCreateInfo]
245    for each pre-rasterization shader stage used
246  * Within the link:{refpage}VkPipelineLayout.html[VkPipelineLayout], all
247    descriptor sets with pre-rasterization shader bindings if
248    `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` was specified.
249  ** If `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` was not specified,
250     the full pipeline layout must be specified.
251  * link:{refpage}VkPipelineViewportStateCreateInfo.html[VkPipelineViewportStateCreateInfo]
252  ** However, all the functionality in that structure is dynamic other than
253     the flags, and this extension allows the structure to be omitted such
254     that it is as-if it was zero-initialized.
255  * link:{refpage}VkPipelineRasterizationStateCreateInfo.html[VkPipelineRasterizationStateCreateInfo]
256  * link:{refpage}VkPipelineTessellationStateCreateInfo.html[VkPipelineTessellationStateCreateInfo]
257    is required if tessellation stages are included.
258  * link:{refpage}VkRenderPass.html[VkRenderPass] and `subpass` parameter
259  * link:{refpage}VkPipelineRenderingCreateInfo.html[VkPipelineRenderingCreateInfo] for the `viewMask` parameter - formats are ignored.
260  * link:{refpage}VkPipelineDiscardRectangleStateCreateInfoEXT.html[VkPipelineDiscardRectangleStateCreateInfoEXT]
261  * link:{refpage}VkPipelineFragmentShadingRateStateCreateInfoKHR.html[VkPipelineFragmentShadingRateStateCreateInfoKHR]
262
263
264#### Fragment Shader
265
266A fragment shader library is defined by the following state:
267
268  * A valid link:{refpage}VkPipelineShaderStageCreateInfo.html[VkPipelineShaderStageCreateInfo]
269    for the fragment shader stage.
270  * Within the link:{refpage}VkPipelineLayout.html[VkPipelineLayout], all
271    descriptor sets with fragment shader bindings if
272    `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` was specified.
273  ** If `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` was not specified,
274     the full pipeline layout must be specified.
275  * link:{refpage}VkPipelineMultisampleStateCreateInfo.html[VkPipelineMultisampleStateCreateInfo]
276    if sample shading is enabled or `renderpass` is not `VK_NULL_HANDLE`.
277  * link:{refpage}VkPipelineDepthStencilStateCreateInfo.html[VkPipelineDepthStencilStateCreateInfo]
278  * link:{refpage}VkRenderPass.html[VkRenderPass] and `subpass` parameter
279  * link:{refpage}VkPipelineRenderingCreateInfo.html[VkPipelineRenderingCreateInfo] for the `viewMask` parameter - formats are ignored.
280  * link:{refpage}VkPipelineFragmentShadingRateStateCreateInfoKHR.html[VkPipelineFragmentShadingRateStateCreateInfoKHR]
281  * link:{refpage}VkPipelineFragmentShadingRateEnumStateCreateInfoNV.html[VkPipelineFragmentShadingRateEnumStateCreateInfoNV]
282  * link:{refpage}VkPipelineRepresentativeFragmentTestStateCreateInfoNV.html[VkPipelineRepresentativeFragmentTestStateCreateInfoNV]
283  * Inclusion/omission of the
284    `VK_PIPELINE_RASTERIZATION_STATE_CREATE_FRAGMENT_SHADING_RATE_ATTACHMENT_BIT_KHR`
285    flag
286  * Inclusion/omission of the
287    `VK_PIPELINE_RASTERIZATION_STATE_CREATE_FRAGMENT_DENSITY_MAP_ATTACHMENT_BIT_EXT`
288    flag
289
290
291#### Fragment Output Interface
292
293A fragment output interface library is defined by the following state:
294
295  * link:{refpage}VkPipelineColorBlendStateCreateInfo.html[VkPipelineColorBlendStateCreateInfo]
296  * link:{refpage}VkPipelineMultisampleStateCreateInfo.html[VkPipelineMultisampleStateCreateInfo]
297  * link:{refpage}VkRenderPass.html[VkRenderPass] and `subpass` parameter
298  * link:{refpage}VkPipelineRenderingCreateInfo.html[VkPipelineRenderingCreateInfo]
299  * link:{refpage}VkAttachmentSampleCountInfoAMD.html[VkAttachmentSampleCountInfoAMD/NV]
300
301
302#### Interactions with extensions
303
304The required structures for each pipeline subset include anything in the `pNext`
305chains of the listed structures; any extensions to these structures are thus
306implicitly accounted for unless otherwise stated.
307includes anything in the `pNext` chains of those structures, so any
308extensions that extend these structures will be automatically accounted for.
309If any extension allows parts of
310link:{refpage}VkGraphicsPipelineCreateInfo.html[VkGraphicsPipelineCreateInfo]
311to be ignored, by default that part of the state will also be ignored when
312using graphics pipeline libraries.
313Any extension that extends the base
314link:{refpage}VkGraphicsPipelineCreateInfo.html[VkGraphicsPipelineCreateInfo]
315directly, or otherwise differs from the above implicit interactions, will
316need an explicit interaction.
317
318### Pipeline Layouts
319
320To allow descriptor sets to be independently specified for each of the two shader library types, a new pipeline layout create flag is added:
321
322[source,c]
323----
324typedef enum VkPipelineLayoutCreateFlagBits {
325    VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT = 0x00000002
326} VkPipelineLayoutCreateFlagBits;
327----
328
329When specified, fragment and pre-rasterization shader pipeline libraries only need to specify the descriptor sets used by that library.
330Descriptor set layouts unused by a library may be set to `VK_NULL_HANDLE`.
331
332
333### Linking
334
335Linking is performed by including the existing
336link:{refpage}VkPipelineLibraryCreateInfoKHR.html[VkPipelineLibraryCreateInfoKHR] structure in the pNext chain of
337link:{refpage}VkGraphicsPipelineCreateInfo.html[VkGraphicsPipelineCreateInfo].
338
339[source,c]
340----
341typedef struct VkPipelineLibraryCreateInfoKHR {
342    VkStructureType      sType;
343    const void*          pNext;
344    uint32_t             libraryCount;
345    const VkPipeline*    pLibraries;
346} VkPipelineLibraryCreateInfoKHR;
347----
348
349Libraries can be linked into other libraries recursively while there are
350still state blobs that can be linked together.
351E.g an application could create a library for the vertex input interface
352and pre-rasterization shaders separately, then link them into a new
353library.
354
355A newly created graphics pipeline consists of the parts defined by
356linked libraries, plus those defined by
357link:{refpage}VkGraphicsPipelineLibraryCreateInfoEXT.html[VkGraphicsPipelineLibraryCreateInfoEXT].
358Parts specified in the pipeline must not overlap those defined by
359libraries, and similarly multiple libraries must not provide the same
360parts.
361Any state required by multiple parts must match.
362
363Graphics pipelines that contain a full set of libraries are executable, may
364not be used for further linking, and must not have the
365`VK_PIPELINE_CREATE_LIBRARY_BIT_KHR` set.
366Graphics pipelines that contain only a subset of stages are not executable,
367may be used for further linking, and must have
368`VK_PIPELINE_CREATE_LIBRARY_BIT_KHR` set.
369
370If `rasterizerDiscardEnable` is enabled, the complete set of parts does
371not include fragment shader or fragment output interface
372libraries.
373
374Two additional bits control how linking is performed:
375
376  * `VK_PIPELINE_CREATE_RETAIN_LINK_TIME_OPTIMIZATION_INFO_BIT_EXT`
377  * `VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT`
378
379`VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT` allows applications
380to specify that linking should perform an optimization pass; when this bit
381is specified, additional optimizations will be performed at link time, and
382the resulting pipeline should perform equivalently to a pipeline created
383monolithically.
384
385To perform link time optimizations,
386`VK_PIPELINE_CREATE_RETAIN_LINK_TIME_OPTIMIZATION_INFO_BIT_EXT` must be
387specified on all pipeline libraries that are being linked together.
388Implementations should retain any additional information needed to perform
389optimizations at the final link step when this bit is present.
390
391If the application created the final linked pipeline with pipeline layouts
392including the `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` flag,
393the final linked pipeline layout is the union of the layouts provided for
394shader stages.
395However, in the specific case that a final link is being performed between
396stages and `VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT` is specified,
397the application can override the pipeline layout with one that is compatible
398with that union but does not have the
399`VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` flag set, allowing a
400more optimal pipeline layout to be used when generating the final pipeline.
401
402
403### Deprecating shader modules
404
405To make single-shader compilation consistent, shader modules will be
406deprecated by allowing link:{refpage}VkShaderModuleCreateInfo.html[VkShaderModuleCreateInfo] to be chained to
407link:{refpage}VkPipelineShaderStageCreateInfo.html[VkPipelineShaderStageCreateInfo], and allowing the
408link:{refpage}VkShaderModule.html[VkShaderModule] to be link:{refpage}VK_NULL_HANDLE.html[VK_NULL_HANDLE] in this case.
409Applications can continue to use shader modules as they are not
410being removed; but it’s strongly recommended to not use them.
411The primary reason for this would be to allow bypassing what is in many
412cases a useless copy, along with potential wasted storage if they are
413retained.
414There have been previous efforts to allow shader modules to be precompiled
415in some way, but this functionality is now being made available in a more
416reliable and portably agreed way, negating the need to focus efforts in
417this area moving forward.
418
419
420## Examples
421
422
423### Compilation
424
425Initial compilation can now be organised into separate chunks, allowing
426consistent earlier compilation for applications that have this information
427available separately, and potentially allows more multithreading
428opportunities for applications that do not.
429
430Below is an example of the information needed to compile a vertex shader:
431
432[source,c]
433----
434VkPipeline createVertexShader(
435    VkDevice device,
436    const uint32_t* pShader,
437    size_t shaderSize,
438    VkPipelineCache vertexShaderCache,
439    VkPipelineLayout layout)
440{
441    VkShaderModuleCreateInfo shaderModuleCreateInfo{};
442    shaderModuleCreateInfo.sType = VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO;
443    shaderModuleCreateInfo.codeSize = shaderSize;
444    shaderModuleCreateInfo.pCode = pShader;
445
446    VkGraphicsPipelineLibraryCreateInfoEXT libraryInfo{};
447    libraryInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_LIBRARY_CREATE_INFO_EXT;
448    libraryInfo.flags = VK_GRAPHICS_PIPELINE_LIBRARY_PRE_RASTERIZATION_SHADERS_BIT_EXT;
449
450    VkPipelineShaderStageCreateInfo stageCreateInfo{};
451    stageCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO;
452    stageCreateInfo.pNext = &shaderModuleCreateInfo;
453    stageCreateInfo.stage = VK_SHADER_STAGE_VERTEX_BIT;
454    stageCreateInfo.pName = "main";
455
456    VkDynamicState vertexDynamicStates[2] = {
457        VK_DYNAMIC_STATE_VIEWPORT_WITH_COUNT_EXT,
458        VK_DYNAMIC_STATE_SCISSOR_WITH_COUNT_EXT };
459
460    VkPipelineDynamicStateCreateInfo dynamicInfo{};
461    dynamicInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_DYNAMIC_STATE_CREATE_INFO;
462    dynamicInfo.dynamicStateCount = 2;
463    dynamicInfo.pDynamicStates = vertexDynamicStates;
464
465    VkGraphicsPipelineCreateInfo vertexShaderCreateInfo{};
466    vertexShaderCreateInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO;
467    vertexShaderCreateInfo.pNext = &libraryInfo;
468    vertexShaderCreateInfo.flags = VK_PIPELINE_CREATE_LIBRARY_BIT_KHR |
469        VK_PIPELINE_CREATE_RETAIN_LINK_TIME_OPTIMIZATION_INFO_BIT_EXT;
470    vertexShaderCreateInfo.stageCount = 1;
471    vertexShaderCreateInfo.pStages = &stageCreateInfo;
472    vertexShaderCreateInfo.layout = layout;
473    vertexShaderCreateInfo.pDynamicState = &dynamicInfo;
474
475    VkPipeline vertexShader;
476    vkCreateGraphicsPipelines(
477        device, vertexShaderCache, 1, &vertexShaderCreateInfo, NULL, &vertexShader);
478
479    return vertexShader;
480}
481----
482
483[NOTE]
484.Note
485====
486This example makes use of
487link:{refpage}VK_KHR_dynamic_rendering.html[VK_KHR_dynamic_rendering] to
488avoid render pass interactions.
489If that extension is not available, a render pass object and the
490corresponding subpass will also need to be provided.
491====
492
493### Linking
494
495Linking is relatively straightforward - pipeline libraries in, executable
496pipeline out, with the option of optimizing the pipeline or not.
497
498[source,c]
499----
500VkPipeline linkExecutable(
501    VkDevice device,
502    VkPipeline* pLibraries,
503    size_t libraryCount,
504    VkPipelineCache executableCache,
505    bool optimized)
506{
507    VkPipelineLibraryCreateInfoKHR linkingInfo{};
508    linkingInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LIBRARY_CREATE_INFO_KHR;
509    linkingInfo.libraryCount = libraryCount;
510    linkingInfo.pLibraries = pLibraries;
511
512    VkGraphicsPipelineCreateInfo executablePipelineCreateInfo{};
513    executablePipelineCreateInfo.sType = VK_STRUCTURE_TYPE_GRAPHICS_PIPELINE_CREATE_INFO;
514    executablePipelineCreateInfo.pNext = &linkingInfo;
515    executablePipelineCreateInfo.flags |= optimized ?
516        VK_PIPELINE_CREATE_LINK_TIME_OPTIMIZATION_BIT_EXT : 0;
517
518    VkPipeline executable = VK_NULL_HANDLE;
519
520    vkCreateGraphicsPipelines(
521        device, executableCache, 1, & executablePipelineCreateInfo, NULL, &executable);
522
523    return executable;
524}
525----
526
527[NOTE]
528.Note
529====
530The behavior of the pipeline cache in this scenario is subject to specific
531behavior depending on implementation properties and whether fast or
532optimized linking is being used.
533This is spelled out in the spec, but summarised briefly again here:
534
535If fast linking is being performed, the implementation should only lookup
536into the cache if it is expected that will be faster than linking.
537If linking is faster, then the cache lookup and any writes to the cache should be skipped.
538The aim of this is to ensure that fast linking is always as fast as
539possible.
540If a cache lookup is performed, optimized pipelines in the cache should be
541returned preferentially to any fast-linked pipelines.
542
543If optimized linking is being performed, the implementation should not
544generate a hit on a suboptimal fast linked pipeline, instead creating a new
545pipeline and corresponding cache entry.
546====
547
548
549## Issues
550
551
552### RESOLVED: Should the pre-rasterization stages be separated?
553
554While splitting the geometry stages may be possible, it’s a significant
555amount of additional work for many vendors, the advantage for most
556developers is unclear, and it would be difficult to make some of the
557guarantees in this extension.
558
559
560### RESOLVED: What is the expected usage model?
561
562When a novel shader/stage combination is seen that requires compilation, it
563should be compiled into a separate pipeline library as early as possible;
564this should be possible alongside usual material/object loading
565(e.g. texture/mesh streaming).
566If an application has its own material cache, the library should be cached
567there.
568Applications should still use pipeline caches to amortize compilation
569across similar stage blobs but should avoid mixing different stage types in
570the same link:{refpage}VkPipelineCache.html[VkPipelineCache], to avoid unnecessary lookup overhead.
571
572Basic linking should then be done as early as the application is able.
573Applications should ideally store/cache this pipeline with relevant objects.
574Using a link:{refpage}VkPipelineCache.html[VkPipelineCache] for this suboptimal pipeline is recommended;
575implementations where this would provide no benefit should ignore the cache
576lookup request for fast linking.
577
578Once a basic link is done, the application should schedule a task for a
579separate thread to create an optimized pipeline.
580This should use pipeline caches in the same manner as existing monolithic
581compilation, sharing this cache with fast-linked pipelines.
582Implementations should prefer returning optimized pipelines from these
583caches.
584Applications should switch to the optimized pipeline as soon as they are
585available.
586
587
588### RESOLVED: Why is there suggested behavior for the implementation of pipeline caches instead of letting the caching be driven by the application?
589
590Work to change the way pipelines are cached is ongoing; to avoid scope creep
591the minimum set of features required to ensure things worked were added.
592A future extension may change how a lot of this works, so it was undesirable to
593design something that would be thrown away later.
594
595
596### RESOLVED: What are the downsides to using unoptimized pipelines?
597
598A fast-linked pipeline may have a significant device performance penalty
599compared to the final pipeline on some implementations.
600Some vendors may have a negligible performance penalty; others will have
601performance penalties differing based on which shader stages are compiled
602together.
603A rough estimate given by vendors is that it could be as bad as a 50%
604penalty in the general case, with outliers performing even worse.
605
606As general advice, applications should be aiming to keep the amount of work
607in each frame performed by unoptimized pipelines relatively low (<10%);
608profiling may be necessary to identify problematic areas.
609Developers are strongly encouraged create optimized pipelines as soon as
610they are able to replace the linked pipeline.
611Relying completely on fast linked pipelines could result in unacceptable
612performance degradation on some implementations.
613
614
615### RESOLVED: Are there any interactions with specialization constants?
616
617No. This extension doesn’t change how specialization constants work – they
618work as they do for existing pipelines.
619If they’re provided, implementations are free to specialize the pipeline or
620not, and cache pipelines that are specialized, unspecialized, or both.
621Specialization constants must be provided alongside the shader stages using
622them and cannot be provided at link time.
623This may be something we want to address in a future extension.
624
625
626### RESOLVED: Are there any interface matching requirements that will need to change, like SSOs in OpenGL/ES?
627
628Some implementations require the interpolation decorations in the last
629geometry shader stage if pipeline libraries are used, and this is
630advertised by the
631`graphicsPipelineLibraryIndependentInterpolationDecoration` property.
632It is expected that these implementations are serving markets where OpenGL
633ES is dominant, where this requirement was never dropped for separate
634shader objects, unlike OpenGL.
635
636
637### RESOLVED: Should we allow passing SPIR-V directly into pipeline creation?
638
639Yes. This simplifies compilation, avoids an unnecessary copy, and brings developers and implementations onto the same page.
640
641
642### RESOLVED: Should we advertise a property for “free link” vs. “fast link”?
643
644Yes, as developers may want to adjust the way they manage pipelines.
645If linking is more or less free, the expectation is that applications may
646link pipelines on demand when recording draw calls.
647If linking is going to take more time, they may try to more aggressively
648pre-cache pipelines.
649This has been added as the `graphicsPipelineLibraryFastLinking`
650property.
651
652Implementation and developer guidance is that if this feature bit is
653advertised, applications should be able to link on demand, so the cost of
654linking should be comparable to recording commands in a command buffer.
655
656
657### RESOLVED: Does anyone need the depth/stencil format to be provided with the depth bias state?
658
659The depth format affects how depth bias is applied, but these are currently provided in separate parts of the pipeline.
660Nobody has claimed this to be a problem.
661
662
663### RESOLVED: With the recommendation to create an optimized pipeline as well as a fast linked pipeline, will this lead to additional memory consumption?
664
665Caches containing pipeline libraries will necessarily increase the total memory consumption of compiled pipelines, as applications will generally try to keep these available while pipeline could be streamed in/out.
666Implementations may be able to use data in the library caches for the final pipelines in some circumstances, which could help mitigate it - but this is not guaranteed and will vary by vendor.
667
668Fast-linked pipelines should not contribute to the total memory consumption if applications destroy the fast-linked pipeline once an optimized version exists.
669
670Improvements to pipeline caches allowing selective eviction of individual caches could help with memory management here, but as this intersects with other known pipeline cache problems, this should be dealt with in a separate extension.
671
672
673### RESOLVED: Should the link time optimization bits apply to other pipeline libraries (e.g. ray tracing)?
674
675Yes, but it will not necessarily be subject to the same quality guarantees. Ray tracing pipeline libraries were not designed with this directly in mind, so while implementations should make use of these bits as best they can, it is not possible to make the same quality guarantees as for graphics pipelines.
676
677Any future extensions using the pipeline library interface should be aware of these interactions and try to follow the intent of these bits as much as possible.
678
679
680### RESOLVED: Does the shader module deprecation apply to other pipelines?
681
682Yes.
683
684
685### RESOLVED: Should `VkPipelineDepthStencilStateCreateInfo` be part of the fragment shader state or fragment output interface state?
686
687Some vendors will make use of this information if it is available and would rather not see it move - however notably all of this state can be made dynamic. Applications wanting to avoid setting this state with the fragment shader library should use this dynamic state.
688
689
690### RESOLVED: Should `VkPipelineMultisampleStateCreateInfo` be required, even when the fragment shader does not make use of multisampling shader features?
691
692The fragment shader library only needs this information when sample shading is enabled.
693
694
695### RESOLVED: Should `VkPipelineMultisampleStateCreateInfo` be part of the fragment output interface instead?
696
697Moving it to the output interface removes the need to create multiple fragment shader libraries for different MSAA rates, which some applications do as a part of dynamic performance tuning.
698This only works when used in conjunction with dynamic rendering; when render pass objects are used, the sample rate will effectively be sourced from any subpass attachments due to validation constraints.
699This could be made to work with subpasses with no attachments, but the additional complexity of adding that path had no clear benefit, so is disallowed.
700
701
702### RESOLVED: Should we add an explicit result code to pipeline creation to indicate that a pipeline compiled with link time optimization has been returned?
703
704No, as the complexity of handling this does not clearly translate to significant application wins.
705
706
707### RESOLVED: Can pipeline caches return optimized pipelines without the `VK_PIPELINE_LAYOUT_CREATE_INDEPENDENT_SETS_BIT_EXT` set?
708
709Not unconditionally - if an implementation does not do anything with that flag then yes, but there is a functional difference then it cannot. I.e. the same as other pipeline state.
710
711