1// Copyright 2021-2023 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5# VK_QCOM_image_processing
6:toc: left
7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/
8:sectnums:
9
10
11This document proposes a new extension that adds shader built-in functions and
12descriptor types for image processing.
13
14## Problem Statement
15
16GPUs commonly process images for a wide range of use-cases.  These include enhancement
17of externally sourced images (i.e., camera image enhancement),  post processing of GPU-rendered
18game content, image scaling, and image analysis (i.e., motion vector generation).  For common use-cases,
19the existing texture built-ins combined with bilinear/bicubic filtering work well.  In other cases,
20higher-order filtering kernels or advanced image algorithms are required.
21
22While such algorithms could be implemented in shader code generically using existing texture
23built-in functions, it requires many round-trips between the texture unit and shader unit.
24The latest Adreno GPUs have dedicated HW shader instructions for such image processing tasks,
25enabling advanced functionality with simplified shader code.   For some use-cases, significant
26performance and power savings are possible using dedicated texture sampling instructions.
27
28## Solution Space
29
30Adreno GPUs have native support for multiple image processing instructions:
31
32* High-order (up to 64x64 kernel) filters with user-supplied weights, and sub-texel phasing support
33* High-order (up to 64x64) box filtering with HW-computed weights, and fractional box sizes
34* Block Matching (up to 64x64) pixel regions across images
35
36These capabilities are currently not exposed in Vulkan.  Exposing these instructions would
37provide a significant increase in functionality beyond current SPIR-V texture built-ins.
38Adreno GPUs exposing this extension perform the above algorithms fully inside the texture
39unit, saving shader instructions cycles, memory bandwidth, and shader register space.
40
41## Proposal
42
43The extension exposes support for 3 new SPIR-V instructions:
44
45* `OpImageWeightedSampleQCOM`: This instruction performs a weighted texture sampling
46operation involving two images: the _sampled image_ and the _weight image_.  An MxN region of texels in the
47_sampled image_ are convolved with an MxN set of scalar weights provided in the _weight image_.  Large filter
48sizes up to 64x64 taps enable important use-cases like edge-detection, feature extraction,
49and anti-aliasing.
50** `Sub-pixel Weighting`:  Frequently the texture coordinates will not align with a texel center in the _sampled image_, and in such cases the kernel weights can be adjusted to reflect the sub-texel sample location.  Sub-texel weighting is supported, where the texel is subdivided into PxP sub-texels, called "phases", with unique weights per-phase.  Adreno GPUs support up to 32x32 phases.
51** `Separable-filters`: Many common 2D image filtering kernels can be expressed as a mathematically equivalent 1D separable kernel.  Separable filters offer significant performance/power savings over their non-separable equivalent.  This instruction supports both separable and non-separable filtering kernels.
52* `OpImageBoxFilterQCOM`: This instruction performs weighted average of the texels within a screen-aligned box.  The operation is similar to bi-linear filtering, except the region of texels is not limited to 2x2. The instruction includes a `BoxSize` parameter, with fractional box sizes up to [64.0, 64.0].  Similar to bi-linear filtering, the implementation computes a weighted average for all texels covered by the box, with the weight for each texel proportional covered area. Large box sizes up to 64x64 enable important use-cases like bulk mipmap generation and high quality single-pass image down-scaling with arbitrary scaling ratios (e.g. thumbnail generation).
53* `opImageBlockMatchSAD` and `code:opImageBlockMatchSSD`: These instructions perform a block matching operation involving two images: the _target image_ and _reference image_.   The instruction takes two sets of integer texture coordinates, and an integer `BlockSize` parameter.  An MxN region of texels in the _target image_ is compared with an MxN region in the _reference image_.  The instruction returns a per-component error metric describing the difference between the two regions.  The SAD returns the sum of the absolute errors and SSD returns the sum of the squared differences.
54
55Each of the image processing instructions operate only on 2D images.  The instructions
56do not-support sampling of mipmap, multi-plane, multi-layer, multi-sampled, or depth/stencil
57images.  The new instructions can be used in any shader stage.
58
59Exposing this functionality in Vulkan makes use of a corresponding SPIR-V extension, and the built-ins
60will be exposed in high-level languages (e.g., GLSL) via related extensions.
61
62
63### SPIR-V Built-in Functions
64
65[cols="1,1,4*3",width="100%"]
66|====
675+|*OpImageSampleWeightedQCOM* +
68 +
69Weighted sample operation +
70 +
71_Result Type_ is the type of the result of weighted sample operation
72 +
73_Texture Sampled Image_ must be an object whose type is OpTypeSampledImage. The MS operand of the
74underlying OpTypeImage must be 0.
75 +
76_Coordinate_ must be a vector of floating-point type, whose vector size is 2.
77 +
78_Weight Image_ must be an object whose type is OpTypeSampledImage decorated with WeightTextureQCOM. The MS operand of the
79underlying OpTypeImage must be 0.
80 +
811+|Capability: +
82*TextureSampleWeightedQCOM*
83| 5 | XXX | <id> _Result Type_ | <<ResultId,'Result <id>' >> | <id> _Texture Sampled Image_ | <id> _Coordinate_ | <id> _Weight Sampled Image_
84|====
85
86[cols="1,1,4*3",width="100%"]
87|====
885+|*OpImageBoxFilterQCOM* +
89 +
90Image box filter operation. +
91 +
92_Result Type_ is the type of the result of image box filter operation
93 +
94_Texture Sampled Image_ must be an object whose type is OpTypeSampledImage. The MS operand of the
95underlying OpTypeImage must be 0.
96 +
97_Coordinate_ must be a vector of floating-point type, whose vector size is 2.
98 +
99_Box Size_ must be a vector of floating-point type, whose vector size is 2 and signedness is 0.
100 +
1011+|Capability: +
102*TextureBoxFilterQCOM*
103| 5 | XXX | <id> _Result Type_ | <<ResultId,'Result <id>' >> | <id> _Texture Sampled Image_ | <id> _Coordinate_ | <id> _Box Size_
104|====
105
106[cols="1,1,6*3",width="100%"]
107|====
1087+|*OpImageBlockMatchSADQCOM* +
109 +
110Image block match sum of absolute differences. +
111 +
112_Result Type_ is the type of the result of image block match sum of absolute differences
113 +
114_Target Sampled Image_ must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the
115underlying OpTypeImage must be 0.
116 +
117_Target Coordinate_ must be a vector of integer type, whose vector size is 2 and signedness is 0.
118 +
119_Reference Sampled Image_ must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the
120underlying OpTypeImage must be 0.
121 +
122_Reference Coordinate_ must be a vector of integer type, whose vector size is 2 and signedness is 0.
123 +
124_Block Size_ must be a vector of integer type, whose vector size is 2 and signedness is 0.
125 +
1261+|Capability: +
127*TextureBlockMatchQCOM*
128| 7 | XXX | <id> _Result Type_ | <<ResultId,'Result <id>' >> | <id> _Target Sampled Image_ | <id> _Target Coordinate_ | <id> _Reference Sampled Image_ | <id> _Reference Coordinate_ | <id> _Block Size_
129|====
130
131[cols="1,1,6*3",width="100%"]
132|====
1337+|*OpImageBlockMatchSSDQCOM* +
134 +
135Image block match sum of square differences. +
136 +
137_Result Type_ is the type of the result of image block match sum of square differences
138 +
139_Target Sampled Image_ must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the
140underlying OpTypeImage must be 0.
141 +
142_Target Coordinate_ must be a vector of integer type, whose vector size is 2 and signedness is 0.
143 +
144_Reference Sampled Image_ must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the
145underlying OpTypeImage must be 0.
146 +
147_Reference Coordinate_ must be a vector of integer type, whose vector size is 2 and signedness is 0.
148 +
149_Block Size_ must be a vector of integer type, whose vector size is 2 and signedness is 0.
150 +
1511+|Capability: +
152*TextureBlockMatchQCOM*
153| 7 | XXX | <id> _Result Type_ | <<ResultId,'Result <id>' >> | <id> _Target Sampled Image_ | <id> _Target Coordinate_ | <id> _Reference Sampled Image_ | <id> _Reference Coordinate_ | <id> _Block Size_
154|====
155
156The extension adds two new SPIR-V decorations
157--
158[options="header"]
159|====
1602+^| Decoration 2+^| Extra Operands     ^| Enabling Capabilities
161| 4487 | *WeightTextureQCOM* +
162Apply to a texture used as 'Weight Image' in OpImageSampleWeightedQCOM.  Behavior is defined by the runtime environment.
1632+| | *TextureWeightedSampleQCOM*
164| 4488 | *BlockMatchTextureQCOM* +
165Apply to textures used as 'Target Sampled Image' and 'Reference Sampled Image' in OpImageBlockMatchSSDQCOM/OpImageBlockMatchSADQCOM. +
166Behavior is defined by the runtime environment.
1672+| | *TextureBlockMatchQCOM*
168|====
169--
170
171This functionality is gated behind 3 SPIR-V capabilities:
172
173[options="header"]
174|====
1752+^| Capability ^| Implicitly declares
176| XXXX | *TextureSampleWeightedQCOM* +
177Add weighted sample operation. |
178|====
179|====
1802+^| Capability ^| Implicitly declares
181| XXXX | *TextureBoxFilterQCOM* +
182Add box filter operation. |
183|====
184|====
1852+^| Capability ^| Implicitly declares
186| XXXX | *TextureBlockMatchQCOM* +
187Add block matching operation (sum of absolute/square differences). |
188|====
189
190
191### High Level Language Exposure
192
193The following summarizes how the built-ins are exposed in GLSL:
194[source,c]
195----
196    +------------------------------------+--------------------------------------------+
197    | Syntax                             | Description                                |
198    +------------------------------------+--------------------------------------------+
199    |   vec4 textureWeightedQCOM(        | weighted sample operation multiplies       |
200    |       sampler2D tex,               | a 2D kernel of filter weights with a corr- |
201    |       vec2      P,                 | esponding region of sampled texels and     |
202    |       sampler2DArray weight)       | sums the results to produce the output     |
203    |                                    | value.                                     |
204    +------------------------------------+--------------------------------------------+
205    |   vec4 textureBoxFilterQCOM(       | Linear operation taking average of pixels  |
206    |       sampler2D tex,               | within the spatial region described by     |
207    |       vec2      P,                 | boxSize.  The box is centered at coordinate|
208    |       vec2      boxSize)           | P and has width and height of boxSize.x    |
209    |                                    | and boxSize.y.                             |
210    +------------------------------------+--------------------------------------------+
211    |   vec4 textureBlockMatchSADQCOM(   | Block matching operation measures the      |
212    |       sampler2D target             | correlation (or similarity) of the target  |
213    |       uvec2     targetCoord,       | block and reference block.  TargetCoord    |
214    |       sampler2D reference,         | and refCoord specify the bottom-left corner|
215    |       uvec2     refCoord,          | of the block in target and reference       |
216    |       uvec2     blockSize)         | images. The error metric is the Sum of     |
217    |                                    | Absolute Differences(SAD).                 |
218    +------------------------------------+--------------------------------------------+
219    |   vec4 textureBlockMatchSSDQCOM(   | Block matching operation measures the      |
220    |       sampler2D target             | correlation (or similarity) of the target  |
221    |       uvec2     targetCoord,       | block and reference block.  TargetCoord    |
222    |       sampler2D reference,         | and refCoord specify the bottom-left corner|
223    |       uvec2     refCoord,          | of the block in target and reference       |
224    |       uvec2     blockSize)         | images. The error metric is the Sum of     |
225    |                                    | Square Differences(SSD).                   |
226    +------------------------------------+--------------------------------------------+
227----
228
229### Features and Properties
230
231Support for weighted sampling, box filtering, and block matching operations are
232indicated by feature bits in a structure that extends
233link:{refpage}VkPhysicalDeviceFeatures2.html[VkPhysicalDeviceFeatures2].
234
235[source,c]
236----
237typedef struct VkPhysicalDeviceImageProcessingFeaturesQCOM {
238    VkStructureType    sType;
239    void*              pNext;
240    VkBool32           textureSampleWeighted;
241    VkBool32           textureBoxFilter;
242    VkBool32           textureBlockMatch;
243} VkPhysicalDeviceImageProcessingFeaturesQCOM;
244----
245
246`textureSampleWeighted` indicates that the implementation supports SPIR-V modules
247declaring the `TextureSampleWeightedQCOM` capability.
248`textureBoxFilter` indicates that the implementation supports SPIR-V modules
249declaring the `TextureBoxFilterQCOM` capability.
250`textureBlockMatch` indicates that the implementation supports SPIR-V modules
251declaring the TextureBlockMatchQCOM capability.
252
253Implementation-specific properties are exposed in a structure that extends
254link:{refpage}VkPhysicalDeviceProperties2.html[VkPhysicalDeviceProperties2].
255
256[source,c]
257----
258typedef struct VkPhysicalDeviceImageProcessingPropertiesQCOM {
259    VkStructureType    sType;
260    void*              pNext;
261    uint32_t           maxWeightFilterPhases;
262    VkExtent2D         maxWeightFilterDimension;
263    VkExtent2D         maxBlockMatchRegion;
264    VkExtent2D         maxBoxFilterBlockSize;
265} VkPhysicalDeviceImageProcessingPropertiesQCOM;
266----
267
268`maxWeightFilterPhases` is the maximum number of sub-pixel phases supported for `OpImageSampleWeightedQCOM`.
269`maxWeightFilterDimension` is the largest supported filter size (width and height) for `OpImageSampleWeightedQCOM`.
270`maxBlockMatchRegion` is the largest supported region size (width and height) for `OpImageBlockMatchSSDQCOM` and `OpImageBlockMatchSADQCOM`.
271`maxBoxFilterBlockSize` is the largest supported BoxSize (width and height) for `OpImageBoxFilterQCOM`.
272
273### VkSampler compatibility
274
275VkSampler objects created for use with the built-ins added with this extension
276must be created with `VK_SAMPLER_CREATE_IMAGE_PROCESSING_BIT_QCOM`.
277Such samplers must not be used with the other existing `OpImage*` built-ins
278unrelated to this extension.  In practice, this means an application must create
279dedicated VkSamplers use use with this extension.
280
281The `OpImageSampleWeightedQCOM` and `OpImageSampleBoxFilterQCOM` built-ins
282support samplers with `unnormalizedCoordinates` equal to `VK_TRUE` or
283`VK_FALSE`.
284The `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSSDQCOM` require
285a sampler with `unnormalizedCoordinates` equal to `VK_TRUE`.
286
287All built-ins added with this extension support samplers with `addressModeU`
288and `addressModeV` equal to
289`VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE` or `VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER`.
290If `VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER` is used, the `borderColor` must be
291opaque black.
292
293All built-ins added with this extension support samplers with all
294link:{refpage}VkSamplerReductionMode.html[VkSamplerReductionModes].
295
296The other
297link:{refpage}VkSamplerCreateInfo.html[VkSamplerCreateInfo] parameters
298must be set to a default values but generally have no effect on the built-ins.
299
300### VkImage compatibility
301
302When creating a VkImage for compatibility with the new built-ins, the driver needs
303additional usage flags.  VkImages must be created with
304`VK_IMAGE_USAGE_SAMPLE_WEIGHT_BIT_QCOM` when used as a _weight image_ with
305`OpImageSampleWeightedQCOM`.  VkImages must be created with
306`VK_IMAGE_USAGE_SAMPLE_BLOCK_MATCH_BIT_QCOM` when used as a
307_reference image_ or _target image_ with `OpImageBlockMatchSADQCOM`
308or `OpImageBlockMatchSSDQCOM`.
309
310### Descriptor Types
311This extension adds two new descriptor Types:
312[source,c]
313----
314VK_DESCRIPTOR_TYPE_BLOCK_MATCH_IMAGE_QCOM
315VK_DESCRIPTOR_TYPE_SAMPLE_WEIGHT_IMAGE_QCOM
316----
317
318`VK_DESCRIPTOR_TYPE_SAMPLE_WEIGHT_IMAGE_QCOM` specifies a 2D image array descriptor
319for a _weight image_ can be used with OpImageSampleWeightedQCOM.  The corresponding
320VkImageView must have been created with `VkImageViewSampleWeightCreateInfoQCOM` in the
321pNext chain.
322
323`VK_DESCRIPTOR_TYPE_BLOCK_MATCH_IMAGE_QCOM` specifies a 2D image descriptor for the
324_reference image_ or _target image_ that can be used with `OpImageBlockMatchSADQCOM`
325or `OpImageBlockMatchSSDQCOM`.
326
327
328### VkFormat Support
329
330Implementations will advertise format support for this extension
331through the `linearTilingFeatures` or `optimalTilingFeatures` of
332link:{refpage}VkFormatProperties3.html[VkFormatProperties3]
333
334[source,c]
335----
336VK_FORMAT_FEATURE_2_WEIGHT_IMAGE_BIT_QCOM
337VK_FORMAT_FEATURE_2_WEIGHT_SAMPLED_IMAGE_BIT_QCOM
338VK_FORMAT_FEATURE_2_BLOCK_MATCHING_BIT_QCOM
339VK_FORMAT_FEATURE_2_BOX_FILTER_SAMPLED_BIT_QCOM
340----
341
342The SPIR-V `OpImageSampleWeightedQCOM` instruction takes two image parameters: the _weight image_ which holds weight values, and the _sampled image_ which holds the texels being sampled.
343
344* `VK_FORMAT_FEATURE_2_WEIGHT_IMAGE_BIT_QCOM` specifies that the format is supported as a _weight image_ with `OpImageSampleWeightedQCOM`.
345* `VK_FORMAT_FEATURE_2_WEIGHT_SAMPLED_IMAGE_BIT_QCOM` specifies that the format is supported as a _sampled image_ with `OpImageSampleWeightedQCOM`.
346
347The SPIR-V `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSADQCOM`  instructions take two image parameters: the _target image_ and the _reference image_.
348
349* `VK_FORMAT_FEATURE_2_BLOCK_MATCHING_BIT_QCOM` specifies that the format is supported as a _target image_ or _reference image_ with both `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSADQCOM`.
350
351The SPIR-V `OpImageBoxFilterQCOM`  instruction takes one image parameter, the _sampled image_.
352
353* `VK_FORMAT_FEATURE_2_BOX_FILTER_SAMPLED_BIT_QCOM` specifies that the format is supported as _sampled image_ with `OpImageBoxFilterQCOM`.
354
355
356### Weight Image Sampling
357
358The SPIR-V `OpImageSampleWeightedQCOM` instruction takes 3 operands: _sampled image_,
359_weight image_, and texture coordinates.  The instruction computes a weighted average
360of an MxN region of texels in the _sampled image_, using a set of MxN weights in the
361_weight image_.
362
363To create a VkImageView for the _weight image_, the
364link:{refpage}VkImageViewCreateInfo.html[VkImageViewCreateInfo] structure
365is extended to provide weight filter parameters.
366[source,c]
367----
368typedef struct VkImageViewSampleWeightCreateInfoQCOM {
369    VkStructureType    sType;
370    const void*        pNext;
371    VkOffset2D         filterCenter;
372    VkExtent2D         filterSize;
373    uint32_t           numPhases;
374} VkImageViewSampleWeightCreateInfoQCOM;
375----
376
377The texture coordinates provided to `OpImageSampleWeightedQCOM`,
378combined with the `filterCenter` and `filterSize` selects a
379region of texels in the _sampled texture_:
380
381[source,c]
382----
383// let (u,v) be 2D unnormalized coordinates passed to `OpImageSampleWeightedQCOM`.
384// The lower-left-texel of the region has integer texel coordinates (i0,j0):
385i0 =  floor(u) - filterCenter.x
386j0 =  floor(v) - filterCenter.y
387
388// the upper-right texel of the region has integer coordinates (imax,jmax)
389imax = i0 + filterSize.width - 1
390jmax = j0 + filterSize.height - 1
391----
392
393If the sampler `reductionMode` is `VK_SAMPLER_REDUCTION_MODE_WEIGHTED_AVERAGE` then the
394value of each texel in the region is multiplied by the associated value from the _weight
395texure_, and the resulting weighted average is summed for each component across all texels
396in the region.  Note that since the weight values are application-defined,
397their sum may be greater than 1.0 or less than 0.0, therefore the
398filter output for UNORM format may be greater than 1.0 or less than 0.0.
399
400If the sampler `reductionMode` is VK_SAMPLER_REDUCTION_MODE_MIN or VK_SAMPLER_REDUCTION_MODE_MAX,
401a component-wise minimum or maximum is computed, for all texels in the region with non-zero
402weights.
403
404#### Sub-texel weighting
405
406The _weight image_ can optionally provide sub-texel weights.  This feature
407is enabled by setting `numPhases` to a value greater than
4081.  In this case, _weight image_ specifies `numPhases` unique sets of
409`filterSize`.`width` x `filterSize`.`height` weights for each phase.
410
411The texels in the _sampled image_ are is subdivided
412both horizontally and vertically in to an NxN grid of sub-texel regions,
413or "phases".
414The number of horizontal and vertical subdivisions must be equal,
415must be a power-of-two.  `numPhases` is the product
416of the horizontal and vertical phase counts.
417
418For example, `numPhases` equal to 4 means that texel is divided into
419two vertical phases and two horizontal phases, and that the weight texture
420defines 4 sets of weights, each with a width and height as specified by
421`filterSize`.  The texture coordinate sub-texel location will determine
422which set of weights is used.
423The maximum supported values for `numPhases` and `filterSize` is specified by
424`VkPhysicalDeviceImageProcessingPropertiesQCOM` `maxWeightFilterPhases` and
425`maxWeightFilterDimension` respectively.
426
427#### Weight Image View Type
428
429The `OpImageSampleWeightedQCOM` _weight image_ created with
430`VkImageViewSampleWeightCreateInfoQCOM` must have a `viewType` of
431either `VK_IMAGE_VIEW_TYPE_1D_ARRAY` which indicates separable
432weight encoding, or `VK_IMAGE_VIEW_TYPE_2D_ARRAY` which indicates
433non-separable weight encoding as described below.
434
435The view type (1D array or 2D array) is the sole indication whether
436the weights are separable or non-separable -- there is no other API state nor any
437shader change to designate separable versus non-separable weight image.
438
439#### Non-Separable Weight Encoding
440
441For a non-separable weight filtering, the view will be type
442VK_IMAGE_VIEW_TYPE_2D_ARRAY.  Each layer of the 2D array
443corresponds to one phase of the filter.  The view's
444`VkImageSubresourceRange::layerCount` must be equal to
445`VkImageViewSampleWeightCreateInfoQCOM::numPhases`. The phases
446are stored as layers in the 2D array, in horizontal phase major
447order,  left-to-right and top-to-bottom. Expressed as a formula,
448the layer index for a each filter phase is computed as:
449
450[source,c]
451----
452layerIndex(horizPhase,vertPhase,horizPhaseCount) = (vertPhase * horizPhaseCount) + horizPhase
453----
454
455
456For each layer, the weights are specified by the value in texels [0, 0] to
457[`filterSize.width`-1, `filterSize.height`-1].
458While is valid for the view's VkImage to have width/height larger than `filterSize`,
459image texels with integer coordinates greater than or equal to `filterSize`
460are ignored by weight sampling.  Image property query instructions `OpImageQuerySize`,
461`OpImageQuerySizeLod`, `OpImageQueryLevels`, and `OpImageQuerySamples` return undefined
462values for a weight image descriptor.
463
464#### Separable Weight Encoding
465
466For a separable weight filtering, the view will be type VK_IMAGE_VIEW_TYPE_1D_ARRAY.
467Horizontal weights for all phases are packed in layer '0' and the vertical weights for
468all phases are packed in layer '1'.  Within each layer, the weights are arranged into
469groups of 4.  For each group, the weights are ordered by by phase. Expressed as a
470formula, the 1D texel offset for all weights and phases within each layer is computed as:
471
472[source,c]
473----
474// Let horizontal weights have a weightIndex of [0, filterSize.width - 1]
475// Let vertical weights have a weightIndex of [0, filterSize.height - 1]
476// Let phaseCount be the number of phases in either the vertical or horizontal direction.
477
478texelOffset(phaseIndex,weightIndex,phaseCount) = (phaseCount * 4 * (weightIndex / 4)) + (phaseIndex * 4) + (weightIndex % 4)
479----
480
481### Box Filter Sampling
482
483The SPIR-V `OpImageBoxFilterQCOM` instruction takes 3 operands: _sampled image_,
484_box size_, and texture coordinates.  Note that _box size_ specifies a floating point
485width and height in texels.  The instruction computes a weighted average of all texels
486in the _sampled image_ that are covered (either partially or fully) by a box with
487the specified size and centered at the specified texture coordinates.
488
489For each texel covered by the box, a weight value is computed by the implementation.
490The weight is proportional to the area of the texel covered.  Those texels that are
491fully covered by the box receive a weight of 1.0.  Those texels that are partially
492covered by the box receive a weight proportional to the covered area.  For example,
493a texel that has one one quarter of its area covered by the box will receive a
494weight of 0.25.
495
496If the sampler `reductionMode` is `VK_SAMPLER_REDUCTION_MODE_WEIGHTED_AVERAGE` then the
497value of each covered texel is multiplied by the weight, and the resulting weighted
498average is summed for each component across all covered texels.  The resulting sum
499is then divided by the _box size_ area.
500
501If the sampler `reductionMode` is VK_SAMPLER_REDUCTION_MODE_MIN or VK_SAMPLER_REDUCTION_MODE_MAX,
502a component-wise minimum or maximum is computed, for all texels covered by the box,
503including texels that are partially covered.
504
505
506### Block Matching Sampling
507
508
509The SPIR-V `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSSDQCOM` instructions
510each takes 5 operands: _target image_, _target coordinates_, _reference image_,
511_reference coordinates_, and _block size_.  Each instruction computes an error
512metric, that describes whether a block of texels in the _target image_ matches
513a corresponding block of texels in the _reference image_.  The error metric
514is computed per-component.  `OpImageBlockMatchSADQCOM` computes "Sum Of Absolute
515Difference" and `OpImageBlockMatchSSDQCOM` computes "Sum of Squared Difference",
516but otherwise both instructions are similar.
517
518Both _target coordinates_ and _reference coordinates_ are integer texel coordinates
519of the lower-left texel of the block to be matched in the _target image_ and
520_reference image_ respectively.
521The _block size_ provides the height and width in integer texels of the regions to
522be matched.
523
524Note that the coordinates and _block size_ may result in a region that extends
525beyond the bounds of _target image_ or _reference image_.  For _target image_,
526this is valid and the  sampler `addressModeU` and `addressModeV` will determine
527the value of such texels.   For _reference image_ case this will result in undefined
528values returned.  The application must guarantee that the _reference region
529does not extend beyond the bounds of _reference image_.
530
531For each texel in the regions, a difference value is computed by subtracting the
532target value from the reference value.  `OpImageBlockMatchSADQCOM` computes the
533absolute value of the difference; this is the _texel error_.  `OpImageBlockMatchSSDQCOM`
534computes the square of the difference; this is the _texel error squared_.
535
536If the sampler `reductionMode` is `VK_SAMPLER_REDUCTION_MODE_WEIGHTED_AVERAGE` then the
537_texel error_ or texel_error_squared for each texel in the region is summed for each
538component across all texels.
539
540If the sampler `reductionMode` is VK_SAMPLER_REDUCTION_MODE_MIN or VK_SAMPLER_REDUCTION_MODE_MAX,
541a component-wise minimum or maximum is computed, for all texels in the region.
542`OpImageBlockMatchSADQCOM` returns the minimum or maximum _texel error_ across
543all texels.    `OpImageBlockMatchSSDQCOM` returns the minimum or maximum _texel error_
544squared.   Note that `OpImageBlockMatchSSDQCOM` does not return the minimum or maximum
545of _texel error squared_.
546
547
548## Expected Features and limits
549
550Below are the properties, features, and formats that are expected to be advertised by a Adreno drivers supporting this extension:
551
552Features supported in VkPhysicalDeviceImageProcessingFeaturesQCOM:
553[source,c]
554----
555    textureSampleWeighted   = TRUE
556    textureBoxFilter        = TRUE
557    textureBlockMatch       = TRUE
558----
559
560Properties reported in VkPhysicalDeviceImageProcessingPropertiesQCOM
561[source,c]
562----
563    maxWeightFilterPhases       = 1024
564    maxWeightFilterDimension    = 64
565    maxBlockMatchRegion         = 64
566    maxBoxFilterBlockSize       = 64
567----
568
569
570Formats supported by _sampled image_ parameter to `OpImageSampleWeightedQCOM` and `OpImageBoxFilterQCOM`
571[source,c]
572----
573    VK_FORMAT_R8_UNORM
574    VK_FORMAT_R8_SNORM
575    VK_FORMAT_R8G8_UNORM
576    VK_FORMAT_R8G8B8A8_UNORM
577    VK_FORMAT_R8G8B8A8_SNORM
578    VK_FORMAT_A8B8G8R8_UNORM_PACK32
579    VK_FORMAT_A8B8G8R8_SNORM_PACK32
580    VK_FORMAT_A2B10G10R10_UNORM_PACK32
581    VK_FORMAT_R16_SFLOAT
582    VK_FORMAT_R16G16_SFLOAT
583    VK_FORMAT_R16G16B16A16_SFLOAT
584    VK_FORMAT_B10G11R11_UFLOAT_PACK32
585    VK_FORMAT_E5B9G9R9_UFLOAT_PACK32
586    VK_FORMAT_BC1_RGB_UNORM_BLOCK
587    VK_FORMAT_BC1_RGB_SRGB_BLOCK
588    VK_FORMAT_BC1_RGBA_UNORM_BLOCK
589    VK_FORMAT_BC1_RGBA_SRGB_BLOCK
590    VK_FORMAT_BC2_SRGB_BLOCK
591    VK_FORMAT_BC3_UNORM_BLOCK
592    VK_FORMAT_BC3_SRGB_BLOCK
593    VK_FORMAT_BC4_UNORM_BLOCK
594    VK_FORMAT_BC4_SNORM_BLOCK
595    VK_FORMAT_BC5_UNORM_BLOCK
596    VK_FORMAT_BC5_SNORM_BLOCK
597    VK_FORMAT_BC6H_UFLOAT_BLOCK
598    VK_FORMAT_BC6H_SFLOAT_BLOCK
599    VK_FORMAT_BC7_UNORM_BLOCK
600    VK_FORMAT_BC7_SRGB_BLOCK
601    VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK
602    VK_FORMAT_ETC2_R8G8B8_SRGB_BLOCK
603    VK_FORMAT_ETC2_R8G8B8A1_UNORM_BLOCK
604    VK_FORMAT_ETC2_R8G8B8A1_SRGB_BLOCK
605    VK_FORMAT_ETC2_R8G8B8A8_UNORM_BLOCK
606    VK_FORMAT_ETC2_R8G8B8A8_SRGB_BLOCK
607    VK_FORMAT_EAC_R11_UNORM_BLOCK
608    VK_FORMAT_EAC_R11_SNORM_BLOCK
609    VK_FORMAT_EAC_R11G11_UNORM_BLOCK
610    VK_FORMAT_EAC_R11G11_SNORM_BLOCK
611    VK_FORMAT_ASTC_4x4_UNORM_BLOCK
612    VK_FORMAT_ASTC_4x4_SRGB_BLOCK
613    VK_FORMAT_ASTC_5x4_UNORM_BLOCK
614    VK_FORMAT_ASTC_5x4_SRGB_BLOCK
615    VK_FORMAT_ASTC_5x5_UNORM_BLOCK
616    VK_FORMAT_ASTC_5x5_SRGB_BLOCK
617    VK_FORMAT_ASTC_6x5_UNORM_BLOCK
618    VK_FORMAT_ASTC_6x5_SRGB_BLOCK
619    VK_FORMAT_ASTC_6x6_UNORM_BLOCK
620    VK_FORMAT_ASTC_6x6_SRGB_BLOCK
621    VK_FORMAT_ASTC_8x5_UNORM_BLOCK
622    VK_FORMAT_ASTC_8x5_SRGB_BLOCK
623    VK_FORMAT_ASTC_8x6_SRGB_BLOCK
624    VK_FORMAT_ASTC_8x8_UNORM_BLOCK
625    VK_FORMAT_ASTC_8x8_SRGB_BLOCK
626    VK_FORMAT_ASTC_10x5_UNORM_BLOCK
627    VK_FORMAT_ASTC_10x5_SRGB_BLOCK
628    VK_FORMAT_ASTC_10x6_UNORM_BLOCK
629    VK_FORMAT_ASTC_10x6_SRGB_BLOCK
630    VK_FORMAT_ASTC_10x8_UNORM_BLOCK
631    VK_FORMAT_ASTC_10x8_SRGB_BLOCK
632    VK_FORMAT_ASTC_10x10_UNORM_BLOCK
633    VK_FORMAT_ASTC_10x10_SRGB_BLOCK
634    VK_FORMAT_ASTC_12x10_UNORM_BLOCK
635    VK_FORMAT_ASTC_12x10_SRGB_BLOCK
636    VK_FORMAT_ASTC_12x12_UNORM_BLOCK
637    VK_FORMAT_ASTC_12x12_SRGB_BLOCK
638    VK_FORMAT_G8B8G8R8_422_UNORM
639    VK_FORMAT_B8G8R8G8_422_UNORM
640    VK_FORMAT_A4B4G4R4_UNORM_PACK16
641    VK_FORMAT_ASTC_4x4_SFLOAT_BLOCK
642    VK_FORMAT_ASTC_5x4_SFLOAT_BLOCK
643    VK_FORMAT_ASTC_5x5_SFLOAT_BLOCK
644    VK_FORMAT_ASTC_6x5_SFLOAT_BLOCK
645    VK_FORMAT_ASTC_6x6_SFLOAT_BLOCK
646    VK_FORMAT_ASTC_8x5_SFLOAT_BLOCK
647    VK_FORMAT_ASTC_8x6_SFLOAT_BLOCK
648    VK_FORMAT_ASTC_8x8_SFLOAT_BLOCK
649    VK_FORMAT_ASTC_10x5_SFLOAT_BLOCK
650    VK_FORMAT_ASTC_10x6_SFLOAT_BLOCK
651    VK_FORMAT_ASTC_10x8_SFLOAT_BLOCK
652    VK_FORMAT_ASTC_10x10_SFLOAT_BLOCK
653    VK_FORMAT_ASTC_12x10_SFLOAT_BLOCK
654    VK_FORMAT_ASTC_12x12_SFLOAT_BLOCK
655----
656
657Formats supported by _weight image_ parameter to `OpImageSampleWeightedQCOM`
658[source,c]
659----
660    VK_FORMAT_R8_UNORM
661    VK_FORMAT_R16_SFLOAT
662----
663
664Formats supported by _target image_ or _referenence image_ parameter to `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSSDQCOM`
665[source,c]
666----
667    VK_FORMAT_R8_UNORM
668    VK_FORMAT_R8G8_UNORM
669    VK_FORMAT_R8G8B8_UNORM
670    VK_FORMAT_R8G8B8A8_UNORM
671    VK_FORMAT_A8B8G8R8_UNORM_PACK32
672    VK_FORMAT_A2B10G10R10_UNORM_PACK32
673    VK_FORMAT_G8B8G8R8_422_UNORM
674    VK_FORMAT_B8G8R8G8_422_UNORM
675----
676
677
678## Issues
679
680### RESOLVED:  Should this be one extension or 3 extensions?
681
682For simplicity, and since we expect this extension supported only for Adreno GPUs, we propose one extension with 3 feature bits.  The associated SPIR-V extension will have 3 capabilities.  The associated GLSL extension will have 3 extension strings.
683
684### RESOLVED:  How does this interact with descriptor indexing ?
685
686The new built-ins added by this extension support descriptor arrays and
687dynamic indexing, but only if the index is dynamically uniform.  The "update-after-bind"
688functionality is fully supported.  Non-uniform dynamic indexing is not supported.  There are no
689feature bits for an implementation to advertise support for dynamic indexing with the
690shader built-ins added in this extension.
691
692The new descriptor types for sample weight image and block match image count against
693the maxPerStageDescriptor[UpdateAfterBind]SampledImages and
694maxDescriptorSetUpdate[AfterBind]SampledImages limits.
695bind"
696
697### RESOLVED:  How does this extension interact with EXT_robustness2 ?
698
699These instructions do not support nullDescriptor feature of robustness2.  If any descriptor accessed by these
700instructions is not bound, undefined results will occur.
701
702### RESOLVED:  How does this interact with push descriptors ?
703
704The descriptors added by this extension can be updated using vkCmdPushDescriptors
705