1// Copyright 2021-2023 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5= VK_KHR_shader_integer_dot_product
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10This document proposes adding support for shader integer dot product instructions.
11
12== Problem Statement
13
14Dot product operations between vectors of integer values are used heavily in machine learning algorithms, acting as a fairly fundamental building block.
15When running machine learning algorithms in Vulkan, these have to be emulated using other integer operations; however many implementations have dedicated fast paths for these operations.
16
17An additional problem is that there is no clear common subset of accelerated dot product operations between vendors - making standardising on a solution somewhat tricky.
18
19This proposal aims to enable these fast paths for machine learning algorithms with minimal difficulty.
20
21
22== Solution Space
23
24There are two main ways in which applications could gain access to these fast paths:
25
26 . Rely on compiler pattern matching to optimise standard integer operations into dot products
27 . Add dedicated dot product operations
28
29The first of those is more or less a "do nothing" approach and puts a burden on implementations to detect these cases, with variable success rates.
30Adding dedicated dot product operations is less error prone, but does mean machine learning content needs to be updated to use these new operations.
31In the long run, the latter is likely to be much more reliable for new applications - so this proposal aims to add new operations.
32
33The question then becomes _which_ dedicated dot product operations should be exposed if there is no common subset of accelerated operations.
34Choices become:
35
36 . Multiple extensions advertising different operations
37 . One extension with the superset of operations but make them all optional
38 . One extension with all operations available, emulating those that are not accelerated
39
40Most existing ML backends targeting SPIR-V compile to SPIR-V once and expect the code to work everywhere within their target market - they will pick a single expression of the ML operations at the macro level and compile to that.
41To run this code everywhere, only option 3 works directly - the only option faced with 1 or 2 would be to emulate the functions as they do today, perhaps picking up optimisations in extreme cases only.
42
43Newer backends such as those using https://www.tensorflow.org/mlir[MLIR] are looking at generating platform-specific optimised IR, which can be done in part by expressing the macro-level operations differently.
44Backends like this could use information about the accelerated operations to determine which SPIR-V operations to target, and thus 1 and 2 are well suited to this.
45Option 3 would also work but would need additional information in order to make optimisation decisions.
46
47In order to satisfy both of these types of backends, this proposal works along the lines of option 3, while providing platform-specific information to allow optimising compilers to make useful choices.
48
49
50== Proposal
51
52=== API Features
53
54The following features are exposed by this extension:
55
56[source,c]
57----
58typedef struct VkPhysicalDeviceShaderIntegerDotProductFeaturesKHR {
59    VkStructureType    sType;
60    void*              pNext;
61    VkBool32           shaderIntegerDotProduct;
62} VkPhysicalDeviceShaderIntegerDotProductFeaturesKHR;
63----
64
65`shaderIntegerDotProduct` is the core feature enabling this extension's functionality.
66
67
68=== API Properties
69
70The following features are exposed by this extension:
71
72[source,c]
73----
74typedef struct VkPhysicalDeviceShaderIntegerDotProductPropertiesKHR {
75    VkStructureType    sType;
76    void*              pNext;
77    VkBool32           integerDotProduct8BitUnsignedAccelerated;
78    VkBool32           integerDotProduct8BitSignedAccelerated;
79    VkBool32           integerDotProduct8BitMixedSignednessAccelerated;
80    VkBool32           integerDotProduct4x8BitPackedUnsignedAccelerated;
81    VkBool32           integerDotProduct4x8BitPackedSignedAccelerated;
82    VkBool32           integerDotProduct4x8BitPackedMixedSignednessAccelerated;
83    VkBool32           integerDotProduct16BitUnsignedAccelerated;
84    VkBool32           integerDotProduct16BitSignedAccelerated;
85    VkBool32           integerDotProduct16BitMixedSignednessAccelerated;
86    VkBool32           integerDotProduct32BitUnsignedAccelerated;
87    VkBool32           integerDotProduct32BitSignedAccelerated;
88    VkBool32           integerDotProduct32BitMixedSignednessAccelerated;
89    VkBool32           integerDotProduct64BitUnsignedAccelerated;
90    VkBool32           integerDotProduct64BitSignedAccelerated;
91    VkBool32           integerDotProduct64BitMixedSignednessAccelerated;
92    VkBool32           integerDotProductAccumulatingSaturating8BitUnsignedAccelerated;
93    VkBool32           integerDotProductAccumulatingSaturating8BitSignedAccelerated;
94    VkBool32           integerDotProductAccumulatingSaturating8BitMixedSignednessAccelerated;
95    VkBool32           integerDotProductAccumulatingSaturating4x8BitPackedUnsignedAccelerated;
96    VkBool32           integerDotProductAccumulatingSaturating4x8BitPackedSignedAccelerated;
97    VkBool32           integerDotProductAccumulatingSaturating4x8BitPackedMixedSignednessAccelerated;
98    VkBool32           integerDotProductAccumulatingSaturating16BitUnsignedAccelerated;
99    VkBool32           integerDotProductAccumulatingSaturating16BitSignedAccelerated;
100    VkBool32           integerDotProductAccumulatingSaturating16BitMixedSignednessAccelerated;
101    VkBool32           integerDotProductAccumulatingSaturating32BitUnsignedAccelerated;
102    VkBool32           integerDotProductAccumulatingSaturating32BitSignedAccelerated;
103    VkBool32           integerDotProductAccumulatingSaturating32BitMixedSignednessAccelerated;
104    VkBool32           integerDotProductAccumulatingSaturating64BitUnsignedAccelerated;
105    VkBool32           integerDotProductAccumulatingSaturating64BitSignedAccelerated;
106    VkBool32           integerDotProductAccumulatingSaturating64BitMixedSignednessAccelerated;
107} VkPhysicalDeviceShaderIntegerDotProductPropertiesKHR;
108----
109
110Each of these properties is a boolean that will be ename:VK_TRUE if the implementation provides a performance advantage for the corresponding SPIR-V instruction, over application-provided code composed from elementary instructions and/or other dot product instructions.
111This could be either because the implementation uses optimized machine code sequences whose generation from application-provided code cannot be guaranteed or because it uses hardware features that cannot otherwise be targeted from application-provided code.
112
113[NOTE]
114---
115Properties are written as `integerDotProduct<AccumulatingSaturating>{type bitwidth}{Unsigned|Signed|MixedSignedness}Accelerated`.
116Each property corresponds to a SPIR-V opcode of the form `Op{U|S|SU}Dot<AccSat>KHR`, as defined in SPIR-V extension SPV_KHR_integer_dot_product.
117The `<AccumulatingSaturating>` portion of the property corresponds to the `AccSat` instruction variants.
118The type bitwidth refers to the size of the input vectors and whether it is a packed format or not.
119`{Unsigned|Signed|MixedSignedness}` in the property correspond to `{U|S|SU}` in the instruction name.
120---
121
122=== SPIR-V Changes
123
124This proposal uses an existing SPIR-V extension: https://htmlpreview.github.io/?https://github.com/KhronosGroup/SPIRV-Registry/blob/master/extensions/KHR/SPV_KHR_integer_dot_product.html[SPV_KHR_integer_dot_product].
125
126
127== Examples
128
129TODO
130
131