1// Copyright 2017-2023 The Khronos Group Inc. 2// 3// SPDX-License-Identifier: CC-BY-4.0 4 5[appendix] 6[[memory-model]] 7= Memory Model 8 9[NOTE] 10.Note 11==== 12This memory model describes synchronizations provided by all 13implementations; however, some of the synchronizations defined require extra 14features to be supported by the implementation. 15ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 16See slink:VkPhysicalDeviceVulkanMemoryModelFeatures. 17endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[] 18==== 19 20[[memory-model-agent]] 21== Agent 22 23_Operation_ is a general term for any task that is executed on the system. 24 25[NOTE] 26.Note 27==== 28An operation is by definition something that is executed. 29Thus if an instruction is skipped due to control flow, it does not 30constitute an operation. 31==== 32 33Each operation is executed by a particular _agent_. 34Possible agents include each shader invocation, each host thread, and each 35fixed-function stage of the pipeline. 36 37 38[[memory-model-memory-location]] 39== Memory Location 40 41A _memory location_ identifies unique storage for 8 bits of data. 42Memory operations access a _set of memory locations_ consisting of one or 43more memory locations at a time, e.g. an operation accessing a 32-bit 44integer in memory would read/write a set of four memory locations. 45Memory operations that access whole aggregates may: access any padding bytes 46between elements or members, but no padding bytes at the end of the 47aggregate. 48Two sets of memory locations _overlap_ if the intersection of their sets of 49memory locations is non-empty. 50A memory operation must: not affect memory at a memory location not within 51its set of memory locations. 52 53Memory locations for buffers and images are explicitly allocated in 54slink:VkDeviceMemory objects, and are implicitly allocated for SPIR-V 55variables in each shader invocation. 56 57ifdef::VK_KHR_workgroup_memory_explicit_layout[] 58Variables with code:Workgroup storage class that point to a block-decorated 59type share a set of memory locations. 60endif::VK_KHR_workgroup_memory_explicit_layout[] 61 62 63[[memory-model-allocation]] 64== Allocation 65 66The values stored in newly allocated memory locations are determined by a 67SPIR-V variable's initializer, if present, or else are undefined:. 68At the time an allocation is created there have been no 69<<memory-model-memory-operation,memory operations>> to any of its memory 70locations. 71The initialization is not considered to be a memory operation. 72 73[NOTE] 74.Note 75==== 76For tessellation control shader output variables, a consequence of 77initialization not being considered a memory operation is that some 78implementations may need to insert a barrier between the initialization of 79the output variables and any reads of those variables. 80==== 81 82 83[[memory-model-memory-operation]] 84== Memory Operation 85 86For an operation A and memory location M: 87 88 * [[memory-model-access-read]] A _reads_ M if and only if the data stored 89 in M is an input to A. 90 * [[memory-model-access-write]] A _writes_ M if and only if the data 91 output from A is stored to M. 92 * [[memory-model-access-access]] A _accesses_ M if and only if it either 93 reads or writes (or both) M. 94 95[NOTE] 96.Note 97==== 98A write whose value is the same as what was already in those memory 99locations is still considered to be a write and has all the same effects. 100==== 101 102 103[[memory-model-references]] 104== Reference 105 106A _reference_ is an object that a particular agent can: use to access a set 107of memory locations. 108On the host, a reference is a host virtual address. 109On the device, a reference is: 110 111 * The descriptor that a variable is bound to, for variables in Image, 112 Uniform, or StorageBuffer storage classes. 113 If the variable is an array (or array of arrays, etc.) then each element 114 of the array may: be a unique reference. 115ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] 116 * The address range for a buffer in code:PhysicalStorageBuffer storage 117 class, where the base of the address range is queried with 118ifndef::VK_VERSION_1_2,VK_KHR_buffer_device_address[] 119 flink:vkGetBufferDeviceAddressEXT 120endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[] 121ifdef::VK_VERSION_1_2,VK_KHR_buffer_device_address[] 122 flink:vkGetBufferDeviceAddress 123endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[] 124 and the length of the range is the size of the buffer. 125endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] 126ifdef::VK_KHR_workgroup_memory_explicit_layout[] 127 * A single common reference for all variables with code:Workgroup storage 128 class that point to a block-decorated type. 129 * The variable itself for non-block-decorated type variables in 130 code:Workgroup storage class. 131endif::VK_KHR_workgroup_memory_explicit_layout[] 132 * The variable itself for variables in other storage classes. 133 134Two memory accesses through distinct references may: require availability 135and visibility operations as defined 136<<memory-model-location-ordered,below>>. 137 138 139[[memory-model-program-order]] 140== Program-Order 141 142A _dynamic instance_ of an instruction is defined in SPIR-V 143(https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#DynamicInstance) 144as a way of referring to a particular execution of a static instruction. 145Program-order is an ordering on dynamic instances of instructions executed 146by a single shader invocation: 147 148 * (Basic block): If instructions A and B are in the same basic block, and 149 A is listed in the module before B, then the n'th dynamic instance of A 150 is program-ordered before the n'th dynamic instance of B. 151 * (Branch): The dynamic instance of a branch or switch instruction is 152 program-ordered before the dynamic instance of the OpLabel instruction 153 to which it transfers control. 154 * (Call entry): The dynamic instance of an code:OpFunctionCall instruction 155 is program-ordered before the dynamic instances of the 156 code:OpFunctionParameter instructions and the body of the called 157 function. 158 * (Call exit): The dynamic instance of the instruction following an 159 code:OpFunctionCall instruction is program-ordered after the dynamic 160 instance of the return instruction executed by the called function. 161 * (Transitive Closure): If dynamic instance A of any instruction is 162 program-ordered before dynamic instance B of any instruction and B is 163 program-ordered before dynamic instance C of any instruction then A is 164 program-ordered before C. 165 * (Complete definition): No other dynamic instances are program-ordered. 166 167For instructions executed on the host, the source language defines the 168program-order relation (e.g. as "`sequenced-before`"). 169 170 171ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 172[[shader-call-related]] 173== Shader Call Related 174 175Shader-call-related is an equivalence relation on invocations defined as the 176symmetric and transitive closure of: 177 178 * A is shader-call-related to B if A is created by an 179 <<ray-tracing-repack,invocation repack>> instruction executed by B. 180 181 182[[shader-call-order]] 183== Shader Call Order 184 185Shader-call-order is a partial order on dynamic instances of instructions 186executed by invocations that are shader-call-related: 187 188 * (Program order): If dynamic instance A is program-ordered before B, then 189 A is shader-call-ordered before B. 190 * (Shader call entry): If A is a dynamic instance of an 191 <<ray-tracing-repack,invocation repack>> instruction and B is a dynamic 192 instance executed by an invocation that is created by A, then A is 193 shader-call-ordered before B. 194 * (Shader call exit): If A is a dynamic instance of an 195 <<ray-tracing-repack,invocation repack>> instruction, B is the next 196 dynamic instance executed by the same invocation, and C is a dynamic 197 instance executed by an invocation that is created by A, then C is 198 shader-call-ordered before B. 199 * (Transitive closure): If A is shader-call-ordered-before B and B is 200 shader-call-ordered-before C, then A is shader-call-ordered-before C. 201 * (Complete definition): No other dynamic instances are 202 shader-call-ordered. 203endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 204 205 206[[memory-model-scope]] 207== Scope 208 209Atomic and barrier instructions include scopes which identify sets of shader 210invocations that must: obey the requested ordering and atomicity rules of 211the operation, as defined below. 212 213The various scopes are described in detail in <<shaders-scope, the Shaders 214chapter>>. 215 216 217[[memory-model-atomic-operation]] 218== Atomic Operation 219 220An _atomic operation_ on the device is any SPIR-V operation whose name 221begins with code:OpAtomic. 222An atomic operation on the host is any operation performed with an 223std::atomic typed object. 224 225Each atomic operation has a memory <<memory-model-scope,scope>> and a 226<<memory-model-memory-semantics,semantics>>. 227Informally, the scope determines which other agents it is atomic with 228respect to, and the <<memory-model-memory-semantics,semantics>> constrains 229its ordering against other memory accesses. 230Device atomic operations have explicit scopes and semantics. 231Each host atomic operation implicitly uses the code:CrossDevice scope, and 232uses a memory semantics equivalent to a C++ std::memory_order value of 233relaxed, acquire, release, acq_rel, or seq_cst. 234 235Two atomic operations A and B are _potentially-mutually-ordered_ if and only 236if all of the following are true: 237 238 * They access the same set of memory locations. 239 * They use the same reference. 240 * A is in the instance of B's memory scope. 241 * B is in the instance of A's memory scope. 242 * A and B are not the same operation (irreflexive). 243 244Two atomic operations A and B are _mutually-ordered_ if and only if they are 245potentially-mutually-ordered and any of the following are true: 246 247 * A and B are both device operations. 248 * A and B are both host operations. 249 * A is a device operation, B is a host operation, and the implementation 250 supports concurrent host- and device-atomics. 251 252[NOTE] 253.Note 254==== 255If two atomic operations are not mutually-ordered, and if their sets of 256memory locations overlap, then each must: be synchronized against the other 257as if they were non-atomic operations. 258==== 259 260 261[[memory-model-scoped-modification-order]] 262== Scoped Modification Order 263 264For a given atomic write A, all atomic writes that are mutually-ordered with 265A occur in an order known as A's _scoped modification order_. 266A's scoped modification order relates no other operations. 267 268[NOTE] 269.Note 270==== 271Invocations outside the instance of A's memory scope may: observe the values 272at A's set of memory locations becoming visible to it in an order that 273disagrees with the scoped modification order. 274==== 275 276[NOTE] 277.Note 278==== 279It is valid to have non-atomic operations or atomics in a different scope 280instance to the same set of memory locations, as long as they are 281synchronized against each other as if they were non-atomic (if they are not, 282it is treated as a <<memory-model-access-data-race,data race>>). 283That means this definition of A's scoped modification order could include 284atomic operations that occur much later, after intervening non-atomics. 285That is a bit non-intuitive, but it helps to keep this definition simple and 286non-circular. 287==== 288 289 290[[memory-model-memory-semantics]] 291== Memory Semantics 292 293Non-atomic memory operations, by default, may: be observed by one agent in a 294different order than they were written by another agent. 295 296Atomics and some synchronization operations include _memory semantics_, 297which are flags that constrain the order in which other memory accesses 298(including non-atomic memory accesses and 299<<memory-model-availability-visibility,availability and visibility 300operations>>) performed by the same agent can: be observed by other agents, 301or can: observe accesses by other agents. 302 303Device instructions that include semantics are code:OpAtomic*, 304code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier. 305Host instructions that include semantics are some std::atomic methods and 306memory fences. 307 308SPIR-V supports the following memory semantics: 309 310 * Relaxed: No constraints on order of other memory accesses. 311 * Acquire: A memory read with this semantic performs an _acquire 312 operation_. 313 A memory barrier with this semantic is an _acquire barrier_. 314 * Release: A memory write with this semantic performs a _release 315 operation_. 316 A memory barrier with this semantic is a _release barrier_. 317 * AcquireRelease: A memory read-modify-write operation with this semantic 318 performs both an acquire operation and a release operation, and inherits 319 the limitations on ordering from both of those operations. 320 A memory barrier with this semantic is both a release and acquire 321 barrier. 322 323[NOTE] 324.Note 325==== 326SPIR-V does not support "`consume`" semantics on the device. 327==== 328 329The memory semantics operand also includes _storage class semantics_ which 330indicate which storage classes are constrained by the synchronization. 331SPIR-V storage class semantics include: 332 333 * UniformMemory 334 * WorkgroupMemory 335 * ImageMemory 336 * OutputMemory 337 338Each SPIR-V memory operation accesses a single storage class. 339Semantics in synchronization operations can include a combination of storage 340classes. 341 342The UniformMemory storage class semantic applies to accesses to memory in 343the 344ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] 345PhysicalStorageBuffer, 346endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[] 347ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 348code:ShaderRecordBufferKHR, 349endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 350Uniform and StorageBuffer storage classes. 351The WorkgroupMemory storage class semantic applies to accesses to memory in 352the Workgroup storage class. 353The ImageMemory storage class semantic applies to accesses to memory in the 354Image storage class. 355The OutputMemory storage class semantic applies to accesses to memory in the 356Output storage class. 357 358[NOTE] 359.Note 360==== 361Informally, these constraints limit how memory operations can be reordered, 362and these limits apply not only to the order of accesses as performed in the 363agent that executes the instruction, but also to the order the effects of 364writes become visible to all other agents within the same instance of the 365instruction's memory scope. 366==== 367 368[NOTE] 369.Note 370==== 371Release and acquire operations in different threads can: act as 372synchronization operations, to guarantee that writes that happened before 373the release are visible after the acquire. 374(This is not a formal definition, just an Informative forward reference.) 375==== 376 377[NOTE] 378.Note 379==== 380The OutputMemory storage class semantic is only useful in tessellation 381control shaders, which is the only execution model where output variables 382are shared between invocations. 383==== 384 385The memory semantics operand can: also include availability and visibility 386flags, which apply availability and visibility operations as described in 387<<memory-model-availability-visibility,availability and visibility>>. 388The availability/visibility flags are: 389 390 * MakeAvailable: Semantics must: be Release or AcquireRelease. 391 Performs an availability operation before the release operation or 392 barrier. 393 * MakeVisible: Semantics must: be Acquire or AcquireRelease. 394 Performs a visibility operation after the acquire operation or barrier. 395 396The specifics of these operations are defined in 397<<memory-model-availability-visibility-semantics,Availability and Visibility 398Semantics>>. 399 400Host atomic operations may: support a different list of memory semantics and 401synchronization operations, depending on the host architecture and source 402language. 403 404 405[[memory-model-release-sequence]] 406== Release Sequence 407 408After an atomic operation A performs a release operation on a set of memory 409locations M, the _release sequence headed by A_ is the longest continuous 410subsequence of A's scoped modification order that consists of: 411 412 * the atomic operation A as its first element 413 * atomic read-modify-write operations on M by any agent 414 415[NOTE] 416.Note 417==== 418The atomics in the last bullet must: be mutually-ordered with A by virtue of 419being in A's scoped modification order. 420==== 421 422[NOTE] 423.Note 424==== 425This intentionally omits "`atomic writes to M performed by the same agent 426that performed A`", which is present in the corresponding C++ definition. 427==== 428 429 430[[memory-model-synchronizes-with]] 431== Synchronizes-With 432 433_Synchronizes-with_ is a relation between operations, where each operation 434is either an atomic operation or a memory barrier (aka fence on the host). 435 436If A and B are atomic operations, then A synchronizes-with B if and only if 437all of the following are true: 438 439 * A performs a release operation 440 * B performs an acquire operation 441 * A and B are mutually-ordered 442 * B reads a value written by A or by an operation in the release sequence 443 headed by A 444 445code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier 446are _memory barrier_ instructions in SPIR-V. 447 448If A is a release barrier and B is an atomic operation that performs an 449acquire operation, then A synchronizes-with B if and only if all of the 450following are true: 451 452 * there exists an atomic write X (with any memory semantics) 453 * A is program-ordered before X 454 * X and B are mutually-ordered 455 * B reads a value written by X or by an operation in the release sequence 456 headed by X 457 ** If X is relaxed, it is still considered to head a hypothetical release 458 sequence for this rule 459 * A and B are in the instance of each other's memory scopes 460 * X's storage class is in A's semantics. 461 462If A is an atomic operation that performs a release operation and B is an 463acquire barrier, then A synchronizes-with B if and only if all of the 464following are true: 465 466 * there exists an atomic read X (with any memory semantics) 467 * X is program-ordered before B 468 * X and A are mutually-ordered 469 * X reads a value written by A or by an operation in the release sequence 470 headed by A 471 * A and B are in the instance of each other's memory scopes 472 * X's storage class is in B's semantics. 473 474If A is a release barrier and B is an acquire barrier, then A 475synchronizes-with B if all of the following are true: 476 477 * there exists an atomic write X (with any memory semantics) 478 * A is program-ordered before X 479 * there exists an atomic read Y (with any memory semantics) 480 * Y is program-ordered before B 481 * X and Y are mutually-ordered 482 * Y reads the value written by X or by an operation in the release 483 sequence headed by X 484 ** If X is relaxed, it is still considered to head a hypothetical release 485 sequence for this rule 486 * A and B are in the instance of each other's memory scopes 487 * X's and Y's storage class is in A's and B's semantics. 488 ** NOTE: X and Y must have the same storage class, because they are 489 mutually ordered. 490 491If A is a release barrier, B is an acquire barrier, and C is a control 492barrier (where A can: equal C, and B can: equal C), then A synchronizes-with 493B if all of the following are true: 494 495 * A is program-ordered before (or equals) C 496 * C is program-ordered before (or equals) B 497 * A and B are in the instance of each other's memory scopes 498 * A and B are in the instance of C's execution scope 499 500[NOTE] 501.Note 502==== 503This is similar to the barrier-barrier synchronization above, but with a 504control barrier filling the role of the relaxed atomics. 505==== 506 507ifdef::VK_EXT_fragment_shader_interlock[] 508 509Let F be an ordering of fragment shader invocations, such that invocation 510F~1~ is ordered before invocation F~2~ if and only if F~1~ and F~2~ overlap 511as described in <<shaders-scope-fragment-interlock,Fragment Shader 512Interlock>> and F~1~ executes the interlocked code before F~2~. 513 514If A is an code:OpEndInvocationInterlockEXT instruction and B is an 515code:OpBeginInvocationInterlockEXT instruction, then A synchronizes-with B 516if the agent that executes A is ordered before the agent that executes B in 517F. A and B are both considered to have code:FragmentInterlock memory scope 518and semantics of UniformMemory and ImageMemory, and A is considered to have 519Release semantics and B is considered to have Acquire semantics. 520 521[NOTE] 522.Note 523==== 524code:OpBeginInvocationInterlockEXT and code:OpBeginInvocationInterlockEXT do 525not perform implicit availability or visibility operations. 526Usually, shaders using fragment shader interlock will declare the relevant 527resources as `coherent` to get implicit 528<<memory-model-instruction-av-vis,per-instruction availability and 529visibility operations>>. 530==== 531 532endif::VK_EXT_fragment_shader_interlock[] 533 534ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 535If A is a release barrier and B is an acquire barrier, then A 536synchronizes-with B if all of the following are true: 537 538 * A is shader-call-ordered-before B 539 * A and B are in the instance of each other's memory scopes 540 541endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 542 543No other release and acquire barriers synchronize-with each other. 544 545 546[[memory-model-system-synchronizes-with]] 547== System-Synchronizes-With 548 549_System-synchronizes-with_ is a relation between arbitrary operations on the 550device or host. 551Certain operations system-synchronize-with each other, which informally 552means the first operation occurs before the second and that the 553synchronization is performed without using application-visible memory 554accesses. 555 556If there is an <<synchronization-dependencies-execution,execution 557dependency>> between two operations A and B, then the operation in the first 558synchronization scope system-synchronizes-with the operation in the second 559synchronization scope. 560 561[NOTE] 562.Note 563==== 564This covers all Vulkan synchronization primitives, including device 565operations executing before a synchronization primitive is signaled, wait 566operations happening before subsequent device operations, signal operations 567happening before host operations that wait on them, and host operations 568happening before flink:vkQueueSubmit. 569The list is spread throughout the synchronization chapter, and is not 570repeated here. 571==== 572 573System-synchronizes-with implicitly includes all storage class semantics and 574has code:CrossDevice scope. 575 576If A system-synchronizes-with B, we also say A is 577_system-synchronized-before_ B and B is _system-synchronized-after_ A. 578 579 580[[memory-model-non-private]] 581== Private vs. Non-Private 582 583By default, non-atomic memory operations are treated as _private_, meaning 584such a memory operation is not intended to be used for communication with 585other agents. 586Memory operations with the NonPrivatePointer/NonPrivateTexel bit set are 587treated as _non-private_, and are intended to be used for communication with 588other agents. 589 590More precisely, for private memory operations to be 591<<memory-model-location-ordered,Location-Ordered>> between distinct agents 592requires using system-synchronizes-with rather than shader-based 593synchronization. 594Private memory operations still obey program-order. 595 596Atomic operations are always considered non-private. 597 598 599[[memory-model-inter-thread-happens-before]] 600== Inter-Thread-Happens-Before 601 602Let SC be a non-empty set of storage class semantics. 603Then (using template syntax) operation A _inter-thread-happens-before_<SC> 604operation B if and only if any of the following is true: 605 606 * A system-synchronizes-with B 607 * A synchronizes-with B, and both A and B have all of SC in their 608 semantics 609 * A is an operation on memory in a storage class in SC or that has all of 610 SC in its semantics, B is a release barrier or release atomic with all 611 of SC in its semantics, and A is program-ordered before B 612 * A is an acquire barrier or acquire atomic with all of SC in its 613 semantics, B is an operation on memory in a storage class in SC or that 614 has all of SC in its semantics, and A is program-ordered before B 615 * A and B are both host operations and A inter-thread-happens-before B as 616 defined in the host language specification 617 * A inter-thread-happens-before<SC> some X and X 618 inter-thread-happens-before<SC> B 619 620 621[[memory-model-happens-before]] 622== Happens-Before 623 624Operation A _happens-before_ operation B if and only if any of the following 625is true: 626 627 * A is program-ordered before B 628 * A inter-thread-happens-before<SC> B for some set of storage classes SC 629 630_Happens-after_ is defined similarly. 631 632[NOTE] 633.Note 634==== 635Unlike C++, happens-before is not always sufficient for a write to be 636visible to a read. 637Additional <<memory-model-availability-visibility,availability and 638visibility>> operations may: be required for writes to be 639<<memory-model-visible-to,visible-to>> other memory accesses. 640==== 641 642[NOTE] 643.Note 644==== 645Happens-before is not transitive, but each of program-order and 646inter-thread-happens-before<SC> are transitive. 647These can be thought of as covering the "`single-threaded`" case and the 648"`multi-threaded`" case, and it is not necessary (and not valid) to form 649chains between the two. 650==== 651 652 653[[memory-model-availability-visibility]] 654== Availability and Visibility 655 656_Availability_ and _visibility_ are states of a write operation, which 657(informally) track how far the write has permeated the system, i.e. which 658agents and references are able to observe the write. 659Availability state is per _memory domain_. 660Visibility state is per (agent,reference) pair. 661Availability and visibility states are per-memory location for each write. 662 663Memory domains are named according to the agents whose memory accesses use 664the domain. 665Domains used by shader invocations are organized hierarchically into 666multiple smaller memory domains which correspond to the different 667<<shaders-scope, scopes>>. 668Each memory domain is considered the _dual_ of a scope, and vice versa. 669The memory domains defined in Vulkan include: 670 671 * _host_ - accessible by host agents 672 * _device_ - accessible by all device agents for a particular device 673 * _shader_ - accessible by shader agents for a particular device, 674 corresponding to the code:Device scope 675 * _queue family instance_ - accessible by shader agents in a single queue 676 family, corresponding to the code:QueueFamily scope. 677ifdef::VK_EXT_fragment_shader_interlock[] 678 * _fragment interlock instance_ - accessible by fragment shader agents 679 that <<shaders-scope-fragment-interlock,overlap>>, corresponding to the 680 code:FragmentInterlock scope. 681endif::VK_EXT_fragment_shader_interlock[] 682ifdef::VK_KHR_ray_tracing_pipeline[] 683 * _shader call instance_ - accessible by shader agents that are 684 <<shader-call-related,shader-call-related>>, corresponding to the 685 code:ShaderCallKHR scope. 686endif::VK_KHR_ray_tracing_pipeline[] 687 * _workgroup instance_ - accessible by shader agents in the same 688 workgroup, corresponding to the code:Workgroup scope. 689 * _subgroup instance_ - accessible by shader agents in the same subgroup, 690 corresponding to the code:Subgroup scope. 691 692The memory domains are nested in the order listed above, 693ifdef::VK_KHR_ray_tracing_pipeline[] 694except for shader call instance domain, 695endif::VK_KHR_ray_tracing_pipeline[] 696with memory domains later in the list nested in the domains earlier in the 697list. 698ifdef::VK_KHR_ray_tracing_pipeline[] 699The shader call instance domain is at an implementation-dependent location 700in the list, and is nested according to that location. 701The shader call instance domain is not broader than the queue family 702instance domain. 703endif::VK_KHR_ray_tracing_pipeline[] 704 705[NOTE] 706.Note 707==== 708Memory domains do not correspond to storage classes or device-local and 709host-local slink:VkDeviceMemory allocations, rather they indicate whether a 710write can be made visible only to agents in the same subgroup, same 711workgroup, 712ifdef::VK_EXT_fragment_shader_interlock[] 713overlapping fragment shader invocation, 714endif::VK_EXT_fragment_shader_interlock[] 715ifdef::VK_KHR_ray_tracing_pipeline[] 716shader-call-related ray tracing invocation, 717endif::VK_KHR_ray_tracing_pipeline[] 718in any shader invocation, or anywhere on the device, or host. 719The shader, queue family instance, 720ifdef::VK_EXT_fragment_shader_interlock[] 721fragment interlock instance, 722endif::VK_EXT_fragment_shader_interlock[] 723ifdef::VK_KHR_ray_tracing_pipeline[] 724shader call instance, 725endif::VK_KHR_ray_tracing_pipeline[] 726workgroup instance, and subgroup instance domains are only used for 727shader-based availability/visibility operations, in other cases writes can 728be made available from/visible to the shader via the device domain. 729==== 730 731_Availability operations_, _visibility operations_, and _memory domain 732operations_ alter the state of the write operations that happen-before them, 733and which are included in their _source scope_ to be available or visible to 734their _destination scope_. 735 736 * For an availability operation, the source scope is a set of 737 (agent,reference,memory location) tuples, and the destination scope is a 738 set of memory domains. 739 * For a memory domain operation, the source scope is a memory domain and 740 the destination scope is a memory domain. 741 * For a visibility operation, the source scope is a set of memory domains 742 and the destination scope is a set of (agent,reference,memory location) 743 tuples. 744 745How the scopes are determined depends on the specific operation. 746Availability and memory domain operations expand the set of memory domains 747to which the write is available. 748Visibility operations expand the set of (agent,reference,memory location) 749tuples to which the write is visible. 750 751Recall that availability and visibility states are per-memory location, and 752let W be a write operation to one or more locations performed by agent A via 753reference R. Let L be one of the locations written. 754(W,L) (the write W to L), is initially not available to any memory domain 755and only visible to (A,R,L). 756An availability operation AV that happens-after W and that includes (A,R,L) 757in its source scope makes (W,L) _available_ to the memory domains in its 758destination scope. 759 760A memory domain operation DOM that happens-after AV and for which (W,L) is 761available in the source scope makes (W,L) available in the destination 762memory domain. 763 764A visibility operation VIS that happens-after AV (or DOM) and for which 765(W,L) is available in any domain in the source scope makes (W,L) _visible_ 766to all (agent,reference,L) tuples included in its destination scope. 767 768If write W~2~ happens-after W, and their sets of memory locations overlap, 769then W will not be available/visible to all agents/references for those 770memory locations that overlap (and future AV/DOM/VIS ops cannot revive W's 771write to those locations). 772 773Availability, memory domain, and visibility operations are treated like 774other non-atomic memory accesses for the purpose of 775<<memory-model-memory-semantics,memory semantics>>, meaning they can be 776ordered by release-acquire sequences or memory barriers. 777 778An _availability chain_ is a sequence of availability operations to 779increasingly broad memory domains, where element N+1 of the chain is 780performed in the dual scope instance of the destination memory domain of 781element N and element N happens-before element N+1. 782An example is an availability operation with destination scope of the 783workgroup instance domain that happens-before an availability operation to 784the shader domain performed by an invocation in the same workgroup. 785An availability chain AVC that happens-after W and that includes (A,R,L) in 786the source scope makes (W,L) _available_ to the memory domains in its final 787destination scope. 788An availability chain with a single element is just the availability 789operation. 790 791Similarly, a _visibility chain_ is a sequence of visibility operations from 792increasingly narrow memory domains, where element N of the chain is 793performed in the dual scope instance of the source memory domain of element 794N+1 and element N happens-before element N+1. 795An example is a visibility operation with source scope of the shader domain 796that happens-before a visibility operation with source scope of the 797workgroup instance domain performed by an invocation in the same workgroup. 798A visibility chain VISC that happens-after AVC (or DOM) and for which (W,L) 799is available in any domain in the source scope makes (W,L) _visible_ to all 800(agent,reference,L) tuples included in its final destination scope. 801A visibility chain with a single element is just the visibility operation. 802 803 804[[memory-model-vulkan-availability-visibility]] 805== Availability, Visibility, and Domain Operations 806 807The following operations generate availability, visibility, and domain 808operations. 809When multiple availability/visibility/domain operations are described, they 810are system-synchronized-with each other in the order listed. 811 812An operation that performs a <<synchronization-dependencies-memory,memory 813dependency>> generates: 814 815 * If the source access mask includes ename:VK_ACCESS_HOST_WRITE_BIT, then 816 the dependency includes a memory domain operation from host domain to 817 device domain. 818 * An availability operation with source scope of all writes in the first 819 <<synchronization-dependencies-access-scopes,access scope>> of the 820 dependency and a destination scope of the device domain. 821 * A visibility operation with source scope of the device domain and 822 destination scope of the second access scope of the dependency. 823 * If the destination access mask includes ename:VK_ACCESS_HOST_READ_BIT or 824 ename:VK_ACCESS_HOST_WRITE_BIT, then the dependency includes a memory 825 domain operation from device domain to host domain. 826 827flink:vkFlushMappedMemoryRanges performs an availability operation, with a 828source scope of (agents,references) = (all host threads, all mapped memory 829ranges passed to the command), and destination scope of the host domain. 830 831flink:vkInvalidateMappedMemoryRanges performs a visibility operation, with a 832source scope of the host domain and a destination scope of 833(agents,references) = (all host threads, all mapped memory ranges passed to 834the command). 835 836flink:vkQueueSubmit performs a memory domain operation from host to device, 837and a visibility operation with source scope of the device domain and 838destination scope of all agents and references on the device. 839 840 841[[memory-model-availability-visibility-semantics]] 842== Availability and Visibility Semantics 843 844A memory barrier or atomic operation via agent A that includes MakeAvailable 845in its semantics performs an availability operation whose source scope 846includes agent A and all references in the storage classes in that 847instruction's storage class semantics, and all memory locations, and whose 848destination scope is a set of memory domains selected as specified below. 849The implicit availability operation is program-ordered between the barrier 850or atomic and all other operations program-ordered before the barrier or 851atomic. 852 853A memory barrier or atomic operation via agent A that includes MakeVisible 854in its semantics performs a visibility operation whose source scope is a set 855of memory domains selected as specified below, and whose destination scope 856includes agent A and all references in the storage classes in that 857instruction's storage class semantics, and all memory locations. 858The implicit visibility operation is program-ordered between the barrier or 859atomic and all other operations program-ordered after the barrier or atomic. 860 861The memory domains are selected based on the memory scope of the instruction 862as follows: 863 864 * code:Device scope uses the shader domain 865 * code:QueueFamily scope uses the queue family instance domain 866ifdef::VK_EXT_fragment_shader_interlock[] 867 * code:FragmentInterlock scope uses the fragment interlock instance domain 868endif::VK_EXT_fragment_shader_interlock[] 869ifdef::VK_KHR_ray_tracing_pipeline[] 870 * code:ShaderCallKHR scope uses the shader call instance domain 871endif::VK_KHR_ray_tracing_pipeline[] 872 * code:Workgroup scope uses the workgroup instance domain 873 * code:Subgroup uses the subgroup instance domain 874 * code:Invocation perform no availability/visibility operations. 875 876When an availability operation performed by an agent A includes a memory 877domain D in its destination scope, where D corresponds to scope instance S, 878it also includes the memory domains that correspond to each smaller scope 879instance S' that is a subset of S and that includes A. Similarly for 880visibility operations. 881 882 883[[memory-model-instruction-av-vis]] 884== Per-Instruction Availability and Visibility Semantics 885 886A memory write instruction that includes MakePointerAvailable, or an image 887write instruction that includes MakeTexelAvailable, performs an availability 888operation whose source scope includes the agent and reference used to 889perform the write and the memory locations written by the instruction, and 890whose destination scope is a set of memory domains selected by the Scope 891operand specified in <<memory-model-availability-visibility-semantics, 892Availability and Visibility Semantics>>. 893The implicit availability operation is program-ordered between the write and 894all other operations program-ordered after the write. 895 896A memory read instruction that includes MakePointerVisible, or an image read 897instruction that includes MakeTexelVisible, performs a visibility operation 898whose source scope is a set of memory domains selected by the Scope operand 899as specified in <<memory-model-availability-visibility-semantics, 900Availability and Visibility Semantics>>, and whose destination scope 901includes the agent and reference used to perform the read and the memory 902locations read by the instruction. 903The implicit visibility operation is program-ordered between read and all 904other operations program-ordered before the read. 905 906[NOTE] 907.Note 908==== 909Although reads with per-instruction visibility only perform visibility ops 910from the shader or 911ifdef::VK_EXT_fragment_shader_interlock[] 912fragment interlock instance or 913endif::VK_EXT_fragment_shader_interlock[] 914ifdef::VK_KHR_ray_tracing_pipeline[] 915shader call instance or 916endif::VK_KHR_ray_tracing_pipeline[] 917workgroup instance or subgroup instance domain, they will also see writes 918that were made visible via the device domain, i.e. those writes previously 919performed by non-shader agents and made visible via API commands. 920==== 921 922[NOTE] 923.Note 924==== 925It is expected that all invocations in a subgroup execute on the same 926processor with the same path to memory, and thus availability and visibility 927operations with subgroup scope can be expected to be "`free`". 928==== 929 930 931[[memory-model-location-ordered]] 932== Location-Ordered 933 934Let X and Y be memory accesses to overlapping sets of memory locations M, 935where X != Y. Let (A~X~,R~X~) be the agent and reference used for X, and 936(A~Y~,R~Y~) be the agent and reference used for Y. For now, let "`->`" 937denote happens-before and "`->^rcpo^`" denote the reflexive closure of 938program-ordered before. 939 940If D~1~ and D~2~ are different memory domains, then let DOM(D~1~,D~2~) be a 941memory domain operation from D~1~ to D~2~. 942Otherwise, let DOM(D,D) be a placeholder such that X->DOM(D,D)->Y if and 943only if X->Y. 944 945X is _location-ordered_ before Y for a location L in M if and only if any of 946the following is true: 947 948 * A~X~ == A~Y~ and R~X~ == R~Y~ and X->Y 949 ** NOTE: this case means no availability/visibility ops are required when 950 it is the same (agent,reference). 951 952 * X is a read, both X and Y are non-private, and X->Y 953 * X is a read, and X (transitively) system-synchronizes with Y 954 955 * If R~X~ == R~Y~ and A~X~ and A~Y~ access a common memory domain D (e.g. 956 are in the same workgroup instance if D is the workgroup instance 957 domain), and both X and Y are non-private: 958 ** X is a write, Y is a write, AVC(A~X~,R~X~,D,L) is an availability chain 959 making (X,L) available to domain D, and X->^rcpo^AVC(A~X~,R~X~,D,L)->Y 960 ** X is a write, Y is a read, AVC(A~X~,R~X~,D,L) is an availability chain 961 making (X,L) available to domain D, VISC(A~Y~,R~Y~,D,L) is a visibility 962 chain making writes to L available in domain D visible to Y, and 963 X->^rcpo^AVC(A~X~,R~X~,D,L)->VISC(A~Y~,R~Y~,D,L)->^rcpo^Y 964 ** If 965 slink:VkPhysicalDeviceVulkanMemoryModelFeatures::pname:vulkanMemoryModelAvailabilityVisibilityChains 966 is ename:VK_FALSE, then AVC and VISC must: each only have a single 967 element in the chain, in each sub-bullet above. 968 969 * Let D~X~ and D~Y~ each be either the device domain or the host domain, 970 depending on whether A~X~ and A~Y~ execute on the device or host: 971 ** X is a write and Y is a write, and 972 X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->Y 973 ** X is a write and Y is a read, and 974 X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->VIS(A~Y~,R~Y~,D~Y~,L)->Y 975 976[NOTE] 977.Note 978==== 979The final bullet (synchronization through device/host domain) requires 980API-level synchronization operations, since the device/host domains are not 981accessible via shader instructions. 982And "`device domain`" is not to be confused with "`device scope`", which 983synchronizes through the "`shader domain`". 984==== 985 986 987[[memory-model-access-data-race]] 988== Data Race 989 990Let X and Y be operations that access overlapping sets of memory locations 991M, where X != Y, and at least one of X and Y is a write, and X and Y are not 992mutually-ordered atomic operations. 993If there does not exist a location-ordered relation between X and Y for each 994location in M, then there is a _data race_. 995 996Applications must: ensure that no data races occur during the execution of 997their application. 998 999[NOTE] 1000.Note 1001==== 1002Data races can only occur due to instructions that are actually executed. 1003For example, an instruction skipped due to control flow must not contribute 1004to a data race. 1005==== 1006 1007 1008[[memory-model-visible-to]] 1009== Visible-To 1010 1011Let X be a write and Y be a read whose sets of memory locations overlap, and 1012let M be the set of memory locations that overlap. 1013Let M~2~ be a non-empty subset of M. Then X is _visible-to_ Y for memory 1014locations M~2~ if and only if all of the following are true: 1015 1016 * X is location-ordered before Y for each location L in M~2~. 1017 * There does not exist another write Z to any location L in M~2~ such that 1018 X is location-ordered before Z for location L and Z is location-ordered 1019 before Y for location L. 1020 1021If X is visible-to Y, then Y reads the value written by X for locations 1022M~2~. 1023 1024[NOTE] 1025.Note 1026==== 1027It is possible for there to be a write between X and Y that overwrites a 1028subset of the memory locations, but the remaining memory locations (M~2~) 1029will still be visible-to Y. 1030==== 1031 1032 1033[[memory-model-acyclicity]] 1034== Acyclicity 1035 1036_Reads-from_ is a relation between operations, where the first operation is 1037a write, the second operation is a read, and the second operation reads the 1038value written by the first operation. 1039_From-reads_ is a relation between operations, where the first operation is 1040a read, the second operation is a write, and the first operation reads a 1041value written earlier than the second operation in the second operation's 1042scoped modification order (or the first operation reads from the initial 1043value, and the second operation is any write to the same locations). 1044 1045Then the implementation must: guarantee that no cycles exist in the union of 1046the following relations: 1047 1048 * location-ordered 1049 * scoped modification order (over all atomic writes) 1050 * reads-from 1051 * from-reads 1052 1053[NOTE] 1054.Note 1055==== 1056This is a "`consistency`" axiom, which informally guarantees that sequences 1057of operations cannot violate causality. 1058==== 1059 1060 1061[[memory-model-scoped-modification-order-coherence]] 1062=== Scoped Modification Order Coherence 1063 1064Let A and B be mutually-ordered atomic operations, where A is 1065location-ordered before B. Then the following rules are a consequence of 1066acyclicity: 1067 1068 * If A and B are both reads and A does not read the initial value, then 1069 the write that A takes its value from must: be earlier in its own scoped 1070 modification order than (or the same as) the write that B takes its 1071 value from (no cycles between location-order, reads-from, and 1072 from-reads). 1073 * If A is a read and B is a write and A does not read the initial value, 1074 then A must: take its value from a write earlier than B in B's scoped 1075 modification order (no cycles between location-order, scope modification 1076 order, and reads-from). 1077 * If A is a write and B is a read, then B must: take its value from A or a 1078 write later than A in A's scoped modification order (no cycles between 1079 location-order, scoped modification order, and from-reads). 1080 * If A and B are both writes, then A must: be earlier than B in A's scoped 1081 modification order (no cycles between location-order and scoped 1082 modification order). 1083 * If A is a write and B is a read-modify-write and B reads the value 1084 written by A, then B comes immediately after A in A's scoped 1085 modification order (no cycles between scoped modification order and 1086 from-reads). 1087 1088 1089[[memory-model-shader-io]] 1090== Shader I/O 1091 1092If a shader invocation A in a shader stage other than code:Vertex performs a 1093memory read operation X from an object in storage class 1094ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1095code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR, 1096code:HitAttributeKHR, code:IncomingRayPayloadKHR, or 1097endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1098code:Input, then X is system-synchronized-after all writes to the 1099corresponding 1100ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1101code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR, 1102code:HitAttributeKHR, code:IncomingRayPayloadKHR, or 1103endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[] 1104code:Output storage variable(s) in the shader invocation(s) that contribute 1105to generating invocation A, and those writes are all visible-to X. 1106 1107[NOTE] 1108.Note 1109==== 1110It is not necessary for the upstream shader invocations to have completed 1111execution, they only need to have generated the output that is being read. 1112==== 1113 1114 1115[[memory-model-deallocation]] 1116== Deallocation 1117 1118ifndef::VKSC_VERSION_1_0[] 1119 1120A call to flink:vkFreeMemory must: happen-after all memory operations on all 1121memory locations in that slink:VkDeviceMemory object. 1122 1123[NOTE] 1124.Note 1125==== 1126Normally, device memory operations in a given queue are synchronized with 1127flink:vkFreeMemory by having a host thread wait on a fence signaled by that 1128queue, and the wait happens-before the call to flink:vkFreeMemory on the 1129host. 1130==== 1131 1132endif::VKSC_VERSION_1_0[] 1133 1134The deallocation of SPIR-V variables is managed by the system and 1135happens-after all operations on those variables. 1136 1137 1138[[memory-model-informative-descriptions]] 1139== Descriptions (Informative) 1140 1141This subsection offers more easily understandable consequences of the memory 1142model for app/compiler developers. 1143 1144Let SC be the storage class(es) specified by a release or acquire operation 1145or barrier. 1146 1147 * An atomic write with release semantics must not be reordered against any 1148 read or write to SC that is program-ordered before it (regardless of the 1149 storage class the atomic is in). 1150 1151 * An atomic read with acquire semantics must not be reordered against any 1152 read or write to SC that is program-ordered after it (regardless of the 1153 storage class the atomic is in). 1154 1155 * Any write to SC program-ordered after a release barrier must not be 1156 reordered against any read or write to SC program-ordered before that 1157 barrier. 1158 1159 * Any read from SC program-ordered before an acquire barrier must not be 1160 reordered against any read or write to SC program-ordered after the 1161 barrier. 1162 1163A control barrier (even if it has no memory semantics) must not be reordered 1164against any memory barriers. 1165 1166This memory model allows memory accesses with and without availability and 1167visibility operations, as well as atomic operations, all to be performed on 1168the same memory location. 1169This is critical to allow it to reason about memory that is reused in 1170multiple ways, e.g. across the lifetime of different shader invocations or 1171draw calls. 1172While GLSL (and legacy SPIR-V) applies the "`coherent`" decoration to 1173variables (for historical reasons), this model treats each memory access 1174instruction as having optional implicit availability/visibility operations. 1175GLSL to SPIR-V compilers should map all (non-atomic) operations on a 1176coherent variable to Make{Pointer,Texel}\{Available}\{Visible} flags in this 1177model. 1178 1179Atomic operations implicitly have availability/visibility operations, and 1180the scope of those operations is taken from the atomic operation's scope. 1181 1182 1183[[memory-model-tessellation-output-ordering]] 1184== Tessellation Output Ordering 1185 1186For SPIR-V that uses the Vulkan Memory Model, the code:OutputMemory storage 1187class is used to synchronize accesses to tessellation control output 1188variables. 1189For legacy SPIR-V that does not enable the Vulkan Memory Model via 1190code:OpMemoryModel, tessellation outputs can be ordered using a control 1191barrier with no particular memory scope or semantics, as defined below. 1192 1193Let X and Y be memory operations performed by shader invocations A~X~ and 1194A~Y~. 1195Operation X is _tessellation-output-ordered_ before operation Y if and only 1196if all of the following are true: 1197 1198 * There is a dynamic instance of an code:OpControlBarrier instruction C 1199 such that X is program-ordered before C in A~X~ and C is program-ordered 1200 before Y in A~Y~. 1201 * A~X~ and A~Y~ are in the same instance of C's execution scope. 1202 1203If shader invocations A~X~ and A~Y~ in the code:TessellationControl 1204execution model execute memory operations X and Y, respectively, on the 1205code:Output storage class, and X is tessellation-output-ordered before Y 1206with a scope of code:Workgroup, then X is location-ordered before Y, and if 1207X is a write and Y is a read then X is visible-to Y. 1208 1209 1210ifdef::VK_NV_cooperative_matrix[] 1211[[memory-model-cooperative-matrix]] 1212== Cooperative Matrix Memory Access 1213 1214For each dynamic instance of a cooperative matrix load or store instruction 1215(code:OpCooperativeMatrixLoadNV or code:OpCooperativeMatrixStoreNV), a 1216single implementation-dependent invocation within the instance of the 1217matrix's scope performs a non-atomic load or store (respectively) to each 1218memory location that is defined to be accessed by the instruction. 1219endif::VK_NV_cooperative_matrix[] 1220