1// Copyright 2021-2023 The Khronos Group Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5# VK_QCOM_image_processing 6:toc: left 7:refpage: https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/ 8:sectnums: 9 10 11This document proposes a new extension that adds shader built-in functions and 12descriptor types for image processing. 13 14## Problem Statement 15 16GPUs commonly process images for a wide range of use-cases. These include enhancement 17of externally sourced images (i.e., camera image enhancement), post processing of GPU-rendered 18game content, image scaling, and image analysis (i.e., motion vector generation). For common use-cases, 19the existing texture built-ins combined with bilinear/bicubic filtering work well. In other cases, 20higher-order filtering kernels or advanced image algorithms are required. 21 22While such algorithms could be implemented in shader code generically using existing texture 23built-in functions, it requires many round-trips between the texture unit and shader unit. 24The latest Adreno GPUs have dedicated HW shader instructions for such image processing tasks, 25enabling advanced functionality with simplified shader code. For some use-cases, significant 26performance and power savings are possible using dedicated texture sampling instructions. 27 28## Solution Space 29 30Adreno GPUs have native support for multiple image processing instructions: 31 32* High-order (up to 64x64 kernel) filters with user-supplied weights, and sub-texel phasing support 33* High-order (up to 64x64) box filtering with HW-computed weights, and fractional box sizes 34* Block Matching (up to 64x64) pixel regions across images 35 36These capabilities are currently not exposed in Vulkan. Exposing these instructions would 37provide a significant increase in functionality beyond current SPIR-V texture built-ins. 38Adreno GPUs exposing this extension perform the above algorithms fully inside the texture 39unit, saving shader instructions cycles, memory bandwidth, and shader register space. 40 41## Proposal 42 43The extension exposes support for 3 new SPIR-V instructions: 44 45* `OpImageWeightedSampleQCOM`: This instruction performs a weighted texture sampling 46operation involving two images: the _sampled image_ and the _weight image_. An MxN region of texels in the 47_sampled image_ are convolved with an MxN set of scalar weights provided in the _weight image_. Large filter 48sizes up to 64x64 taps enable important use-cases like edge-detection, feature extraction, 49and anti-aliasing. 50** `Sub-pixel Weighting`: Frequently the texture coordinates will not align with a texel center in the _sampled image_, and in such cases the kernel weights can be adjusted to reflect the sub-texel sample location. Sub-texel weighting is supported, where the texel is subdivided into PxP sub-texels, called "phases", with unique weights per-phase. Adreno GPUs support up to 32x32 phases. 51** `Separable-filters`: Many common 2D image filtering kernels can be expressed as a mathematically equivalent 1D separable kernel. Separable filters offer significant performance/power savings over their non-separable equivalent. This instruction supports both separable and non-separable filtering kernels. 52* `OpImageBoxFilterQCOM`: This instruction performs weighted average of the texels within a screen-aligned box. The operation is similar to bi-linear filtering, except the region of texels is not limited to 2x2. The instruction includes a `BoxSize` parameter, with fractional box sizes up to [64.0, 64.0]. Similar to bi-linear filtering, the implementation computes a weighted average for all texels covered by the box, with the weight for each texel proportional covered area. Large box sizes up to 64x64 enable important use-cases like bulk mipmap generation and high quality single-pass image down-scaling with arbitrary scaling ratios (e.g. thumbnail generation). 53* `opImageBlockMatchSAD` and `code:opImageBlockMatchSSD`: These instructions perform a block matching operation involving two images: the _target image_ and _reference image_. The instruction takes two sets of integer texture coordinates, and an integer `BlockSize` parameter. An MxN region of texels in the _target image_ is compared with an MxN region in the _reference image_. The instruction returns a per-component error metric describing the difference between the two regions. The SAD returns the sum of the absolute errors and SSD returns the sum of the squared differences. 54 55Each of the image processing instructions operate only on 2D images. The instructions 56do not-support sampling of mipmap, multi-plane, multi-layer, multi-sampled, or depth/stencil 57images. The new instructions can be used in any shader stage. 58 59Exposing this functionality in Vulkan makes use of a corresponding SPIR-V extension, and the built-ins 60will be exposed in high-level languages (e.g., GLSL) via related extensions. 61 62 63### SPIR-V Built-in Functions 64 65[cols="1,1,4*3",width="100%"] 66|==== 675+|*OpImageSampleWeightedQCOM* + 68 + 69Weighted sample operation + 70 + 71_Result Type_ is the type of the result of weighted sample operation 72 + 73_Texture Sampled Image_ must be an object whose type is OpTypeSampledImage. The MS operand of the 74underlying OpTypeImage must be 0. 75 + 76_Coordinate_ must be a vector of floating-point type, whose vector size is 2. 77 + 78_Weight Image_ must be an object whose type is OpTypeSampledImage decorated with WeightTextureQCOM. The MS operand of the 79underlying OpTypeImage must be 0. 80 + 811+|Capability: + 82*TextureSampleWeightedQCOM* 83| 5 | XXX | <id> _Result Type_ | <<ResultId,'Result <id>' >> | <id> _Texture Sampled Image_ | <id> _Coordinate_ | <id> _Weight Sampled Image_ 84|==== 85 86[cols="1,1,4*3",width="100%"] 87|==== 885+|*OpImageBoxFilterQCOM* + 89 + 90Image box filter operation. + 91 + 92_Result Type_ is the type of the result of image box filter operation 93 + 94_Texture Sampled Image_ must be an object whose type is OpTypeSampledImage. The MS operand of the 95underlying OpTypeImage must be 0. 96 + 97_Coordinate_ must be a vector of floating-point type, whose vector size is 2. 98 + 99_Box Size_ must be a vector of floating-point type, whose vector size is 2 and signedness is 0. 100 + 1011+|Capability: + 102*TextureBoxFilterQCOM* 103| 5 | XXX | <id> _Result Type_ | <<ResultId,'Result <id>' >> | <id> _Texture Sampled Image_ | <id> _Coordinate_ | <id> _Box Size_ 104|==== 105 106[cols="1,1,6*3",width="100%"] 107|==== 1087+|*OpImageBlockMatchSADQCOM* + 109 + 110Image block match sum of absolute differences. + 111 + 112_Result Type_ is the type of the result of image block match sum of absolute differences 113 + 114_Target Sampled Image_ must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the 115underlying OpTypeImage must be 0. 116 + 117_Target Coordinate_ must be a vector of integer type, whose vector size is 2 and signedness is 0. 118 + 119_Reference Sampled Image_ must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the 120underlying OpTypeImage must be 0. 121 + 122_Reference Coordinate_ must be a vector of integer type, whose vector size is 2 and signedness is 0. 123 + 124_Block Size_ must be a vector of integer type, whose vector size is 2 and signedness is 0. 125 + 1261+|Capability: + 127*TextureBlockMatchQCOM* 128| 7 | XXX | <id> _Result Type_ | <<ResultId,'Result <id>' >> | <id> _Target Sampled Image_ | <id> _Target Coordinate_ | <id> _Reference Sampled Image_ | <id> _Reference Coordinate_ | <id> _Block Size_ 129|==== 130 131[cols="1,1,6*3",width="100%"] 132|==== 1337+|*OpImageBlockMatchSSDQCOM* + 134 + 135Image block match sum of square differences. + 136 + 137_Result Type_ is the type of the result of image block match sum of square differences 138 + 139_Target Sampled Image_ must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the 140underlying OpTypeImage must be 0. 141 + 142_Target Coordinate_ must be a vector of integer type, whose vector size is 2 and signedness is 0. 143 + 144_Reference Sampled Image_ must be an object whose type is OpTypeSampledImage decorated with BlockMatchTextureQCOM. The MS operand of the 145underlying OpTypeImage must be 0. 146 + 147_Reference Coordinate_ must be a vector of integer type, whose vector size is 2 and signedness is 0. 148 + 149_Block Size_ must be a vector of integer type, whose vector size is 2 and signedness is 0. 150 + 1511+|Capability: + 152*TextureBlockMatchQCOM* 153| 7 | XXX | <id> _Result Type_ | <<ResultId,'Result <id>' >> | <id> _Target Sampled Image_ | <id> _Target Coordinate_ | <id> _Reference Sampled Image_ | <id> _Reference Coordinate_ | <id> _Block Size_ 154|==== 155 156The extension adds two new SPIR-V decorations 157-- 158[options="header"] 159|==== 1602+^| Decoration 2+^| Extra Operands ^| Enabling Capabilities 161| 4487 | *WeightTextureQCOM* + 162Apply to a texture used as 'Weight Image' in OpImageSampleWeightedQCOM. Behavior is defined by the runtime environment. 1632+| | *TextureWeightedSampleQCOM* 164| 4488 | *BlockMatchTextureQCOM* + 165Apply to textures used as 'Target Sampled Image' and 'Reference Sampled Image' in OpImageBlockMatchSSDQCOM/OpImageBlockMatchSADQCOM. + 166Behavior is defined by the runtime environment. 1672+| | *TextureBlockMatchQCOM* 168|==== 169-- 170 171This functionality is gated behind 3 SPIR-V capabilities: 172 173[options="header"] 174|==== 1752+^| Capability ^| Implicitly declares 176| XXXX | *TextureSampleWeightedQCOM* + 177Add weighted sample operation. | 178|==== 179|==== 1802+^| Capability ^| Implicitly declares 181| XXXX | *TextureBoxFilterQCOM* + 182Add box filter operation. | 183|==== 184|==== 1852+^| Capability ^| Implicitly declares 186| XXXX | *TextureBlockMatchQCOM* + 187Add block matching operation (sum of absolute/square differences). | 188|==== 189 190 191### High Level Language Exposure 192 193The following summarizes how the built-ins are exposed in GLSL: 194[source,c] 195---- 196 +------------------------------------+--------------------------------------------+ 197 | Syntax | Description | 198 +------------------------------------+--------------------------------------------+ 199 | vec4 textureWeightedQCOM( | weighted sample operation multiplies | 200 | sampler2D tex, | a 2D kernel of filter weights with a corr- | 201 | vec2 P, | esponding region of sampled texels and | 202 | sampler2DArray weight) | sums the results to produce the output | 203 | | value. | 204 +------------------------------------+--------------------------------------------+ 205 | vec4 textureBoxFilterQCOM( | Linear operation taking average of pixels | 206 | sampler2D tex, | within the spatial region described by | 207 | vec2 P, | boxSize. The box is centered at coordinate| 208 | vec2 boxSize) | P and has width and height of boxSize.x | 209 | | and boxSize.y. | 210 +------------------------------------+--------------------------------------------+ 211 | vec4 textureBlockMatchSADQCOM( | Block matching operation measures the | 212 | sampler2D target | correlation (or similarity) of the target | 213 | uvec2 targetCoord, | block and reference block. TargetCoord | 214 | sampler2D reference, | and refCoord specify the bottom-left corner| 215 | uvec2 refCoord, | of the block in target and reference | 216 | uvec2 blockSize) | images. The error metric is the Sum of | 217 | | Absolute Differences(SAD). | 218 +------------------------------------+--------------------------------------------+ 219 | vec4 textureBlockMatchSSDQCOM( | Block matching operation measures the | 220 | sampler2D target | correlation (or similarity) of the target | 221 | uvec2 targetCoord, | block and reference block. TargetCoord | 222 | sampler2D reference, | and refCoord specify the bottom-left corner| 223 | uvec2 refCoord, | of the block in target and reference | 224 | uvec2 blockSize) | images. The error metric is the Sum of | 225 | | Square Differences(SSD). | 226 +------------------------------------+--------------------------------------------+ 227---- 228 229### Features and Properties 230 231Support for weighted sampling, box filtering, and block matching operations are 232indicated by feature bits in a structure that extends 233link:{refpage}VkPhysicalDeviceFeatures2.html[VkPhysicalDeviceFeatures2]. 234 235[source,c] 236---- 237typedef struct VkPhysicalDeviceImageProcessingFeaturesQCOM { 238 VkStructureType sType; 239 void* pNext; 240 VkBool32 textureSampleWeighted; 241 VkBool32 textureBoxFilter; 242 VkBool32 textureBlockMatch; 243} VkPhysicalDeviceImageProcessingFeaturesQCOM; 244---- 245 246`textureSampleWeighted` indicates that the implementation supports SPIR-V modules 247declaring the `TextureSampleWeightedQCOM` capability. 248`textureBoxFilter` indicates that the implementation supports SPIR-V modules 249declaring the `TextureBoxFilterQCOM` capability. 250`textureBlockMatch` indicates that the implementation supports SPIR-V modules 251declaring the TextureBlockMatchQCOM capability. 252 253Implementation-specific properties are exposed in a structure that extends 254link:{refpage}VkPhysicalDeviceProperties2.html[VkPhysicalDeviceProperties2]. 255 256[source,c] 257---- 258typedef struct VkPhysicalDeviceImageProcessingPropertiesQCOM { 259 VkStructureType sType; 260 void* pNext; 261 uint32_t maxWeightFilterPhases; 262 VkExtent2D maxWeightFilterDimension; 263 VkExtent2D maxBlockMatchRegion; 264 VkExtent2D maxBoxFilterBlockSize; 265} VkPhysicalDeviceImageProcessingPropertiesQCOM; 266---- 267 268`maxWeightFilterPhases` is the maximum number of sub-pixel phases supported for `OpImageSampleWeightedQCOM`. 269`maxWeightFilterDimension` is the largest supported filter size (width and height) for `OpImageSampleWeightedQCOM`. 270`maxBlockMatchRegion` is the largest supported region size (width and height) for `OpImageBlockMatchSSDQCOM` and `OpImageBlockMatchSADQCOM`. 271`maxBoxFilterBlockSize` is the largest supported BoxSize (width and height) for `OpImageBoxFilterQCOM`. 272 273### VkSampler compatibility 274 275VkSampler objects created for use with the built-ins added with this extension 276must be created with `VK_SAMPLER_CREATE_IMAGE_PROCESSING_BIT_QCOM`. 277Such samplers must not be used with the other existing `OpImage*` built-ins 278unrelated to this extension. In practice, this means an application must create 279dedicated VkSamplers use use with this extension. 280 281The `OpImageSampleWeightedQCOM` and `OpImageSampleBoxFilterQCOM` built-ins 282support samplers with `unnormalizedCoordinates` equal to `VK_TRUE` or 283`VK_FALSE`. 284The `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSSDQCOM` require 285a sampler with `unnormalizedCoordinates` equal to `VK_TRUE`. 286 287All built-ins added with this extension support samplers with `addressModeU` 288and `addressModeV` equal to 289`VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_EDGE` or `VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER`. 290If `VK_SAMPLER_ADDRESS_MODE_CLAMP_TO_BORDER` is used, the `borderColor` must be 291opaque black. 292 293All built-ins added with this extension support samplers with all 294link:{refpage}VkSamplerReductionMode.html[VkSamplerReductionModes]. 295 296The other 297link:{refpage}VkSamplerCreateInfo.html[VkSamplerCreateInfo] parameters 298must be set to a default values but generally have no effect on the built-ins. 299 300### VkImage compatibility 301 302When creating a VkImage for compatibility with the new built-ins, the driver needs 303additional usage flags. VkImages must be created with 304`VK_IMAGE_USAGE_SAMPLE_WEIGHT_BIT_QCOM` when used as a _weight image_ with 305`OpImageSampleWeightedQCOM`. VkImages must be created with 306`VK_IMAGE_USAGE_SAMPLE_BLOCK_MATCH_BIT_QCOM` when used as a 307_reference image_ or _target image_ with `OpImageBlockMatchSADQCOM` 308or `OpImageBlockMatchSSDQCOM`. 309 310### Descriptor Types 311This extension adds two new descriptor Types: 312[source,c] 313---- 314VK_DESCRIPTOR_TYPE_BLOCK_MATCH_IMAGE_QCOM 315VK_DESCRIPTOR_TYPE_SAMPLE_WEIGHT_IMAGE_QCOM 316---- 317 318`VK_DESCRIPTOR_TYPE_SAMPLE_WEIGHT_IMAGE_QCOM` specifies a 2D image array descriptor 319for a _weight image_ can be used with OpImageSampleWeightedQCOM. The corresponding 320VkImageView must have been created with `VkImageViewSampleWeightCreateInfoQCOM` in the 321pNext chain. 322 323`VK_DESCRIPTOR_TYPE_BLOCK_MATCH_IMAGE_QCOM` specifies a 2D image descriptor for the 324_reference image_ or _target image_ that can be used with `OpImageBlockMatchSADQCOM` 325or `OpImageBlockMatchSSDQCOM`. 326 327 328### VkFormat Support 329 330Implementations will advertise format support for this extension 331through the `linearTilingFeatures` or `optimalTilingFeatures` of 332link:{refpage}VkFormatProperties3.html[VkFormatProperties3] 333 334[source,c] 335---- 336VK_FORMAT_FEATURE_2_WEIGHT_IMAGE_BIT_QCOM 337VK_FORMAT_FEATURE_2_WEIGHT_SAMPLED_IMAGE_BIT_QCOM 338VK_FORMAT_FEATURE_2_BLOCK_MATCHING_BIT_QCOM 339VK_FORMAT_FEATURE_2_BOX_FILTER_SAMPLED_BIT_QCOM 340---- 341 342The SPIR-V `OpImageSampleWeightedQCOM` instruction takes two image parameters: the _weight image_ which holds weight values, and the _sampled image_ which holds the texels being sampled. 343 344* `VK_FORMAT_FEATURE_2_WEIGHT_IMAGE_BIT_QCOM` specifies that the format is supported as a _weight image_ with `OpImageSampleWeightedQCOM`. 345* `VK_FORMAT_FEATURE_2_WEIGHT_SAMPLED_IMAGE_BIT_QCOM` specifies that the format is supported as a _sampled image_ with `OpImageSampleWeightedQCOM`. 346 347The SPIR-V `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSADQCOM` instructions take two image parameters: the _target image_ and the _reference image_. 348 349* `VK_FORMAT_FEATURE_2_BLOCK_MATCHING_BIT_QCOM` specifies that the format is supported as a _target image_ or _reference image_ with both `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSADQCOM`. 350 351The SPIR-V `OpImageBoxFilterQCOM` instruction takes one image parameter, the _sampled image_. 352 353* `VK_FORMAT_FEATURE_2_BOX_FILTER_SAMPLED_BIT_QCOM` specifies that the format is supported as _sampled image_ with `OpImageBoxFilterQCOM`. 354 355 356### Weight Image Sampling 357 358The SPIR-V `OpImageSampleWeightedQCOM` instruction takes 3 operands: _sampled image_, 359_weight image_, and texture coordinates. The instruction computes a weighted average 360of an MxN region of texels in the _sampled image_, using a set of MxN weights in the 361_weight image_. 362 363To create a VkImageView for the _weight image_, the 364link:{refpage}VkImageViewCreateInfo.html[VkImageViewCreateInfo] structure 365is extended to provide weight filter parameters. 366[source,c] 367---- 368typedef struct VkImageViewSampleWeightCreateInfoQCOM { 369 VkStructureType sType; 370 const void* pNext; 371 VkOffset2D filterCenter; 372 VkExtent2D filterSize; 373 uint32_t numPhases; 374} VkImageViewSampleWeightCreateInfoQCOM; 375---- 376 377The texture coordinates provided to `OpImageSampleWeightedQCOM`, 378combined with the `filterCenter` and `filterSize` selects a 379region of texels in the _sampled texture_: 380 381[source,c] 382---- 383// let (u,v) be 2D unnormalized coordinates passed to `OpImageSampleWeightedQCOM`. 384// The lower-left-texel of the region has integer texel coordinates (i0,j0): 385i0 = floor(u) - filterCenter.x 386j0 = floor(v) - filterCenter.y 387 388// the upper-right texel of the region has integer coordinates (imax,jmax) 389imax = i0 + filterSize.width - 1 390jmax = j0 + filterSize.height - 1 391---- 392 393If the sampler `reductionMode` is `VK_SAMPLER_REDUCTION_MODE_WEIGHTED_AVERAGE` then the 394value of each texel in the region is multiplied by the associated value from the _weight 395texure_, and the resulting weighted average is summed for each component across all texels 396in the region. Note that since the weight values are application-defined, 397their sum may be greater than 1.0 or less than 0.0, therefore the 398filter output for UNORM format may be greater than 1.0 or less than 0.0. 399 400If the sampler `reductionMode` is VK_SAMPLER_REDUCTION_MODE_MIN or VK_SAMPLER_REDUCTION_MODE_MAX, 401a component-wise minimum or maximum is computed, for all texels in the region with non-zero 402weights. 403 404#### Sub-texel weighting 405 406The _weight image_ can optionally provide sub-texel weights. This feature 407is enabled by setting `numPhases` to a value greater than 4081. In this case, _weight image_ specifies `numPhases` unique sets of 409`filterSize`.`width` x `filterSize`.`height` weights for each phase. 410 411The texels in the _sampled image_ are is subdivided 412both horizontally and vertically in to an NxN grid of sub-texel regions, 413or "phases". 414The number of horizontal and vertical subdivisions must be equal, 415must be a power-of-two. `numPhases` is the product 416of the horizontal and vertical phase counts. 417 418For example, `numPhases` equal to 4 means that texel is divided into 419two vertical phases and two horizontal phases, and that the weight texture 420defines 4 sets of weights, each with a width and height as specified by 421`filterSize`. The texture coordinate sub-texel location will determine 422which set of weights is used. 423The maximum supported values for `numPhases` and `filterSize` is specified by 424`VkPhysicalDeviceImageProcessingPropertiesQCOM` `maxWeightFilterPhases` and 425`maxWeightFilterDimension` respectively. 426 427#### Weight Image View Type 428 429The `OpImageSampleWeightedQCOM` _weight image_ created with 430`VkImageViewSampleWeightCreateInfoQCOM` must have a `viewType` of 431either `VK_IMAGE_VIEW_TYPE_1D_ARRAY` which indicates separable 432weight encoding, or `VK_IMAGE_VIEW_TYPE_2D_ARRAY` which indicates 433non-separable weight encoding as described below. 434 435The view type (1D array or 2D array) is the sole indication whether 436the weights are separable or non-separable -- there is no other API state nor any 437shader change to designate separable versus non-separable weight image. 438 439#### Non-Separable Weight Encoding 440 441For a non-separable weight filtering, the view will be type 442VK_IMAGE_VIEW_TYPE_2D_ARRAY. Each layer of the 2D array 443corresponds to one phase of the filter. The view's 444`VkImageSubresourceRange::layerCount` must be equal to 445`VkImageViewSampleWeightCreateInfoQCOM::numPhases`. The phases 446are stored as layers in the 2D array, in horizontal phase major 447order, left-to-right and top-to-bottom. Expressed as a formula, 448the layer index for a each filter phase is computed as: 449 450[source,c] 451---- 452layerIndex(horizPhase,vertPhase,horizPhaseCount) = (vertPhase * horizPhaseCount) + horizPhase 453---- 454 455 456For each layer, the weights are specified by the value in texels [0, 0] to 457[`filterSize.width`-1, `filterSize.height`-1]. 458While is valid for the view's VkImage to have width/height larger than `filterSize`, 459image texels with integer coordinates greater than or equal to `filterSize` 460are ignored by weight sampling. Image property query instructions `OpImageQuerySize`, 461`OpImageQuerySizeLod`, `OpImageQueryLevels`, and `OpImageQuerySamples` return undefined 462values for a weight image descriptor. 463 464#### Separable Weight Encoding 465 466For a separable weight filtering, the view will be type VK_IMAGE_VIEW_TYPE_1D_ARRAY. 467Horizontal weights for all phases are packed in layer '0' and the vertical weights for 468all phases are packed in layer '1'. Within each layer, the weights are arranged into 469groups of 4. For each group, the weights are ordered by by phase. Expressed as a 470formula, the 1D texel offset for all weights and phases within each layer is computed as: 471 472[source,c] 473---- 474// Let horizontal weights have a weightIndex of [0, filterSize.width - 1] 475// Let vertical weights have a weightIndex of [0, filterSize.height - 1] 476// Let phaseCount be the number of phases in either the vertical or horizontal direction. 477 478texelOffset(phaseIndex,weightIndex,phaseCount) = (phaseCount * 4 * (weightIndex / 4)) + (phaseIndex * 4) + (weightIndex % 4) 479---- 480 481### Box Filter Sampling 482 483The SPIR-V `OpImageBoxFilterQCOM` instruction takes 3 operands: _sampled image_, 484_box size_, and texture coordinates. Note that _box size_ specifies a floating point 485width and height in texels. The instruction computes a weighted average of all texels 486in the _sampled image_ that are covered (either partially or fully) by a box with 487the specified size and centered at the specified texture coordinates. 488 489For each texel covered by the box, a weight value is computed by the implementation. 490The weight is proportional to the area of the texel covered. Those texels that are 491fully covered by the box receive a weight of 1.0. Those texels that are partially 492covered by the box receive a weight proportional to the covered area. For example, 493a texel that has one one quarter of its area covered by the box will receive a 494weight of 0.25. 495 496If the sampler `reductionMode` is `VK_SAMPLER_REDUCTION_MODE_WEIGHTED_AVERAGE` then the 497value of each covered texel is multiplied by the weight, and the resulting weighted 498average is summed for each component across all covered texels. The resulting sum 499is then divided by the _box size_ area. 500 501If the sampler `reductionMode` is VK_SAMPLER_REDUCTION_MODE_MIN or VK_SAMPLER_REDUCTION_MODE_MAX, 502a component-wise minimum or maximum is computed, for all texels covered by the box, 503including texels that are partially covered. 504 505 506### Block Matching Sampling 507 508 509The SPIR-V `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSSDQCOM` instructions 510each takes 5 operands: _target image_, _target coordinates_, _reference image_, 511_reference coordinates_, and _block size_. Each instruction computes an error 512metric, that describes whether a block of texels in the _target image_ matches 513a corresponding block of texels in the _reference image_. The error metric 514is computed per-component. `OpImageBlockMatchSADQCOM` computes "Sum Of Absolute 515Difference" and `OpImageBlockMatchSSDQCOM` computes "Sum of Squared Difference", 516but otherwise both instructions are similar. 517 518Both _target coordinates_ and _reference coordinates_ are integer texel coordinates 519of the lower-left texel of the block to be matched in the _target image_ and 520_reference image_ respectively. 521The _block size_ provides the height and width in integer texels of the regions to 522be matched. 523 524Note that the coordinates and _block size_ may result in a region that extends 525beyond the bounds of _target image_ or _reference image_. For _target image_, 526this is valid and the sampler `addressModeU` and `addressModeV` will determine 527the value of such texels. For _reference image_ case this will result in undefined 528values returned. The application must guarantee that the _reference region 529does not extend beyond the bounds of _reference image_. 530 531For each texel in the regions, a difference value is computed by subtracting the 532target value from the reference value. `OpImageBlockMatchSADQCOM` computes the 533absolute value of the difference; this is the _texel error_. `OpImageBlockMatchSSDQCOM` 534computes the square of the difference; this is the _texel error squared_. 535 536If the sampler `reductionMode` is `VK_SAMPLER_REDUCTION_MODE_WEIGHTED_AVERAGE` then the 537_texel error_ or texel_error_squared for each texel in the region is summed for each 538component across all texels. 539 540If the sampler `reductionMode` is VK_SAMPLER_REDUCTION_MODE_MIN or VK_SAMPLER_REDUCTION_MODE_MAX, 541a component-wise minimum or maximum is computed, for all texels in the region. 542`OpImageBlockMatchSADQCOM` returns the minimum or maximum _texel error_ across 543all texels. `OpImageBlockMatchSSDQCOM` returns the minimum or maximum _texel error_ 544squared. Note that `OpImageBlockMatchSSDQCOM` does not return the minimum or maximum 545of _texel error squared_. 546 547 548## Expected Features and limits 549 550Below are the properties, features, and formats that are expected to be advertised by a Adreno drivers supporting this extension: 551 552Features supported in VkPhysicalDeviceImageProcessingFeaturesQCOM: 553[source,c] 554---- 555 textureSampleWeighted = TRUE 556 textureBoxFilter = TRUE 557 textureBlockMatch = TRUE 558---- 559 560Properties reported in VkPhysicalDeviceImageProcessingPropertiesQCOM 561[source,c] 562---- 563 maxWeightFilterPhases = 1024 564 maxWeightFilterDimension = 64 565 maxBlockMatchRegion = 64 566 maxBoxFilterBlockSize = 64 567---- 568 569 570Formats supported by _sampled image_ parameter to `OpImageSampleWeightedQCOM` and `OpImageBoxFilterQCOM` 571[source,c] 572---- 573 VK_FORMAT_R8_UNORM 574 VK_FORMAT_R8_SNORM 575 VK_FORMAT_R8G8_UNORM 576 VK_FORMAT_R8G8B8A8_UNORM 577 VK_FORMAT_R8G8B8A8_SNORM 578 VK_FORMAT_A8B8G8R8_UNORM_PACK32 579 VK_FORMAT_A8B8G8R8_SNORM_PACK32 580 VK_FORMAT_A2B10G10R10_UNORM_PACK32 581 VK_FORMAT_R16_SFLOAT 582 VK_FORMAT_R16G16_SFLOAT 583 VK_FORMAT_R16G16B16A16_SFLOAT 584 VK_FORMAT_B10G11R11_UFLOAT_PACK32 585 VK_FORMAT_E5B9G9R9_UFLOAT_PACK32 586 VK_FORMAT_BC1_RGB_UNORM_BLOCK 587 VK_FORMAT_BC1_RGB_SRGB_BLOCK 588 VK_FORMAT_BC1_RGBA_UNORM_BLOCK 589 VK_FORMAT_BC1_RGBA_SRGB_BLOCK 590 VK_FORMAT_BC2_SRGB_BLOCK 591 VK_FORMAT_BC3_UNORM_BLOCK 592 VK_FORMAT_BC3_SRGB_BLOCK 593 VK_FORMAT_BC4_UNORM_BLOCK 594 VK_FORMAT_BC4_SNORM_BLOCK 595 VK_FORMAT_BC5_UNORM_BLOCK 596 VK_FORMAT_BC5_SNORM_BLOCK 597 VK_FORMAT_BC6H_UFLOAT_BLOCK 598 VK_FORMAT_BC6H_SFLOAT_BLOCK 599 VK_FORMAT_BC7_UNORM_BLOCK 600 VK_FORMAT_BC7_SRGB_BLOCK 601 VK_FORMAT_ETC2_R8G8B8_UNORM_BLOCK 602 VK_FORMAT_ETC2_R8G8B8_SRGB_BLOCK 603 VK_FORMAT_ETC2_R8G8B8A1_UNORM_BLOCK 604 VK_FORMAT_ETC2_R8G8B8A1_SRGB_BLOCK 605 VK_FORMAT_ETC2_R8G8B8A8_UNORM_BLOCK 606 VK_FORMAT_ETC2_R8G8B8A8_SRGB_BLOCK 607 VK_FORMAT_EAC_R11_UNORM_BLOCK 608 VK_FORMAT_EAC_R11_SNORM_BLOCK 609 VK_FORMAT_EAC_R11G11_UNORM_BLOCK 610 VK_FORMAT_EAC_R11G11_SNORM_BLOCK 611 VK_FORMAT_ASTC_4x4_UNORM_BLOCK 612 VK_FORMAT_ASTC_4x4_SRGB_BLOCK 613 VK_FORMAT_ASTC_5x4_UNORM_BLOCK 614 VK_FORMAT_ASTC_5x4_SRGB_BLOCK 615 VK_FORMAT_ASTC_5x5_UNORM_BLOCK 616 VK_FORMAT_ASTC_5x5_SRGB_BLOCK 617 VK_FORMAT_ASTC_6x5_UNORM_BLOCK 618 VK_FORMAT_ASTC_6x5_SRGB_BLOCK 619 VK_FORMAT_ASTC_6x6_UNORM_BLOCK 620 VK_FORMAT_ASTC_6x6_SRGB_BLOCK 621 VK_FORMAT_ASTC_8x5_UNORM_BLOCK 622 VK_FORMAT_ASTC_8x5_SRGB_BLOCK 623 VK_FORMAT_ASTC_8x6_SRGB_BLOCK 624 VK_FORMAT_ASTC_8x8_UNORM_BLOCK 625 VK_FORMAT_ASTC_8x8_SRGB_BLOCK 626 VK_FORMAT_ASTC_10x5_UNORM_BLOCK 627 VK_FORMAT_ASTC_10x5_SRGB_BLOCK 628 VK_FORMAT_ASTC_10x6_UNORM_BLOCK 629 VK_FORMAT_ASTC_10x6_SRGB_BLOCK 630 VK_FORMAT_ASTC_10x8_UNORM_BLOCK 631 VK_FORMAT_ASTC_10x8_SRGB_BLOCK 632 VK_FORMAT_ASTC_10x10_UNORM_BLOCK 633 VK_FORMAT_ASTC_10x10_SRGB_BLOCK 634 VK_FORMAT_ASTC_12x10_UNORM_BLOCK 635 VK_FORMAT_ASTC_12x10_SRGB_BLOCK 636 VK_FORMAT_ASTC_12x12_UNORM_BLOCK 637 VK_FORMAT_ASTC_12x12_SRGB_BLOCK 638 VK_FORMAT_G8B8G8R8_422_UNORM 639 VK_FORMAT_B8G8R8G8_422_UNORM 640 VK_FORMAT_A4B4G4R4_UNORM_PACK16 641 VK_FORMAT_ASTC_4x4_SFLOAT_BLOCK 642 VK_FORMAT_ASTC_5x4_SFLOAT_BLOCK 643 VK_FORMAT_ASTC_5x5_SFLOAT_BLOCK 644 VK_FORMAT_ASTC_6x5_SFLOAT_BLOCK 645 VK_FORMAT_ASTC_6x6_SFLOAT_BLOCK 646 VK_FORMAT_ASTC_8x5_SFLOAT_BLOCK 647 VK_FORMAT_ASTC_8x6_SFLOAT_BLOCK 648 VK_FORMAT_ASTC_8x8_SFLOAT_BLOCK 649 VK_FORMAT_ASTC_10x5_SFLOAT_BLOCK 650 VK_FORMAT_ASTC_10x6_SFLOAT_BLOCK 651 VK_FORMAT_ASTC_10x8_SFLOAT_BLOCK 652 VK_FORMAT_ASTC_10x10_SFLOAT_BLOCK 653 VK_FORMAT_ASTC_12x10_SFLOAT_BLOCK 654 VK_FORMAT_ASTC_12x12_SFLOAT_BLOCK 655---- 656 657Formats supported by _weight image_ parameter to `OpImageSampleWeightedQCOM` 658[source,c] 659---- 660 VK_FORMAT_R8_UNORM 661 VK_FORMAT_R16_SFLOAT 662---- 663 664Formats supported by _target image_ or _referenence image_ parameter to `OpImageBlockMatchSADQCOM` and `OpImageBlockMatchSSDQCOM` 665[source,c] 666---- 667 VK_FORMAT_R8_UNORM 668 VK_FORMAT_R8G8_UNORM 669 VK_FORMAT_R8G8B8_UNORM 670 VK_FORMAT_R8G8B8A8_UNORM 671 VK_FORMAT_A8B8G8R8_UNORM_PACK32 672 VK_FORMAT_A2B10G10R10_UNORM_PACK32 673 VK_FORMAT_G8B8G8R8_422_UNORM 674 VK_FORMAT_B8G8R8G8_422_UNORM 675---- 676 677 678## Issues 679 680### RESOLVED: Should this be one extension or 3 extensions? 681 682For simplicity, and since we expect this extension supported only for Adreno GPUs, we propose one extension with 3 feature bits. The associated SPIR-V extension will have 3 capabilities. The associated GLSL extension will have 3 extension strings. 683 684### RESOLVED: How does this interact with descriptor indexing ? 685 686The new built-ins added by this extension support descriptor arrays and 687dynamic indexing, but only if the index is dynamically uniform. The "update-after-bind" 688functionality is fully supported. Non-uniform dynamic indexing is not supported. There are no 689feature bits for an implementation to advertise support for dynamic indexing with the 690shader built-ins added in this extension. 691 692The new descriptor types for sample weight image and block match image count against 693the maxPerStageDescriptor[UpdateAfterBind]SampledImages and 694maxDescriptorSetUpdate[AfterBind]SampledImages limits. 695bind" 696 697### RESOLVED: How does this extension interact with EXT_robustness2 ? 698 699These instructions do not support nullDescriptor feature of robustness2. If any descriptor accessed by these 700instructions is not bound, undefined results will occur. 701 702### RESOLVED: How does this interact with push descriptors ? 703 704The descriptors added by this extension can be updated using vkCmdPushDescriptors 705