1// Copyright 2017-2023 The Khronos Group Inc.
2//
3// SPDX-License-Identifier: CC-BY-4.0
4
5[appendix]
6[[memory-model]]
7= Memory Model
8
9[NOTE]
10.Note
11====
12This memory model describes synchronizations provided by all
13implementations; however, some of the synchronizations defined require extra
14features to be supported by the implementation.
15ifdef::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
16See slink:VkPhysicalDeviceVulkanMemoryModelFeatures.
17endif::VK_VERSION_1_2,VK_KHR_vulkan_memory_model[]
18====
19
20[[memory-model-agent]]
21== Agent
22
23_Operation_ is a general term for any task that is executed on the system.
24
25[NOTE]
26.Note
27====
28An operation is by definition something that is executed.
29Thus if an instruction is skipped due to control flow, it does not
30constitute an operation.
31====
32
33Each operation is executed by a particular _agent_.
34Possible agents include each shader invocation, each host thread, and each
35fixed-function stage of the pipeline.
36
37
38[[memory-model-memory-location]]
39== Memory Location
40
41A _memory location_ identifies unique storage for 8 bits of data.
42Memory operations access a _set of memory locations_ consisting of one or
43more memory locations at a time, e.g. an operation accessing a 32-bit
44integer in memory would read/write a set of four memory locations.
45Memory operations that access whole aggregates may: access any padding bytes
46between elements or members, but no padding bytes at the end of the
47aggregate.
48Two sets of memory locations _overlap_ if the intersection of their sets of
49memory locations is non-empty.
50A memory operation must: not affect memory at a memory location not within
51its set of memory locations.
52
53Memory locations for buffers and images are explicitly allocated in
54slink:VkDeviceMemory objects, and are implicitly allocated for SPIR-V
55variables in each shader invocation.
56
57ifdef::VK_KHR_workgroup_memory_explicit_layout[]
58Variables with code:Workgroup storage class that point to a block-decorated
59type share a set of memory locations.
60endif::VK_KHR_workgroup_memory_explicit_layout[]
61
62
63[[memory-model-allocation]]
64== Allocation
65
66The values stored in newly allocated memory locations are determined by a
67SPIR-V variable's initializer, if present, or else are undefined:.
68At the time an allocation is created there have been no
69<<memory-model-memory-operation,memory operations>> to any of its memory
70locations.
71The initialization is not considered to be a memory operation.
72
73[NOTE]
74.Note
75====
76For tessellation control shader output variables, a consequence of
77initialization not being considered a memory operation is that some
78implementations may need to insert a barrier between the initialization of
79the output variables and any reads of those variables.
80====
81
82
83[[memory-model-memory-operation]]
84== Memory Operation
85
86For an operation A and memory location M:
87
88  * [[memory-model-access-read]] A _reads_ M if and only if the data stored
89    in M is an input to A.
90  * [[memory-model-access-write]] A _writes_ M if and only if the data
91    output from A is stored to M.
92  * [[memory-model-access-access]] A _accesses_ M if and only if it either
93    reads or writes (or both) M.
94
95[NOTE]
96.Note
97====
98A write whose value is the same as what was already in those memory
99locations is still considered to be a write and has all the same effects.
100====
101
102
103[[memory-model-references]]
104== Reference
105
106A _reference_ is an object that a particular agent can: use to access a set
107of memory locations.
108On the host, a reference is a host virtual address.
109On the device, a reference is:
110
111  * The descriptor that a variable is bound to, for variables in Image,
112    Uniform, or StorageBuffer storage classes.
113    If the variable is an array (or array of arrays, etc.) then each element
114    of the array may: be a unique reference.
115ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
116  * The address range for a buffer in code:PhysicalStorageBuffer storage
117    class, where the base of the address range is queried with
118ifndef::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
119    flink:vkGetBufferDeviceAddressEXT
120endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
121ifdef::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
122    flink:vkGetBufferDeviceAddress
123endif::VK_VERSION_1_2,VK_KHR_buffer_device_address[]
124    and the length of the range is the size of the buffer.
125endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
126ifdef::VK_KHR_workgroup_memory_explicit_layout[]
127  * A single common reference for all variables with code:Workgroup storage
128    class that point to a block-decorated type.
129  * The variable itself for non-block-decorated type variables in
130    code:Workgroup storage class.
131endif::VK_KHR_workgroup_memory_explicit_layout[]
132  * The variable itself for variables in other storage classes.
133
134Two memory accesses through distinct references may: require availability
135and visibility operations as defined
136<<memory-model-location-ordered,below>>.
137
138
139[[memory-model-program-order]]
140== Program-Order
141
142A _dynamic instance_ of an instruction is defined in SPIR-V
143(https://registry.khronos.org/spir-v/specs/unified1/SPIRV.html#DynamicInstance)
144as a way of referring to a particular execution of a static instruction.
145Program-order is an ordering on dynamic instances of instructions executed
146by a single shader invocation:
147
148  * (Basic block): If instructions A and B are in the same basic block, and
149    A is listed in the module before B, then the n'th dynamic instance of A
150    is program-ordered before the n'th dynamic instance of B.
151  * (Branch): The dynamic instance of a branch or switch instruction is
152    program-ordered before the dynamic instance of the OpLabel instruction
153    to which it transfers control.
154  * (Call entry): The dynamic instance of an code:OpFunctionCall instruction
155    is program-ordered before the dynamic instances of the
156    code:OpFunctionParameter instructions and the body of the called
157    function.
158  * (Call exit): The dynamic instance of the instruction following an
159    code:OpFunctionCall instruction is program-ordered after the dynamic
160    instance of the return instruction executed by the called function.
161  * (Transitive Closure): If dynamic instance A of any instruction is
162    program-ordered before dynamic instance B of any instruction and B is
163    program-ordered before dynamic instance C of any instruction then A is
164    program-ordered before C.
165  * (Complete definition): No other dynamic instances are program-ordered.
166
167For instructions executed on the host, the source language defines the
168program-order relation (e.g. as "`sequenced-before`").
169
170
171ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
172[[shader-call-related]]
173== Shader Call Related
174
175Shader-call-related is an equivalence relation on invocations defined as the
176symmetric and transitive closure of:
177
178  * A is shader-call-related to B if A is created by an
179    <<ray-tracing-repack,invocation repack>> instruction executed by B.
180
181
182[[shader-call-order]]
183== Shader Call Order
184
185Shader-call-order is a partial order on dynamic instances of instructions
186executed by invocations that are shader-call-related:
187
188  * (Program order): If dynamic instance A is program-ordered before B, then
189    A is shader-call-ordered before B.
190  * (Shader call entry): If A is a dynamic instance of an
191    <<ray-tracing-repack,invocation repack>> instruction and B is a dynamic
192    instance executed by an invocation that is created by A, then A is
193    shader-call-ordered before B.
194  * (Shader call exit): If A is a dynamic instance of an
195    <<ray-tracing-repack,invocation repack>> instruction, B is the next
196    dynamic instance executed by the same invocation, and C is a dynamic
197    instance executed by an invocation that is created by A, then C is
198    shader-call-ordered before B.
199  * (Transitive closure): If A is shader-call-ordered-before B and B is
200    shader-call-ordered-before C, then A is shader-call-ordered-before C.
201  * (Complete definition): No other dynamic instances are
202    shader-call-ordered.
203endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
204
205
206[[memory-model-scope]]
207== Scope
208
209Atomic and barrier instructions include scopes which identify sets of shader
210invocations that must: obey the requested ordering and atomicity rules of
211the operation, as defined below.
212
213The various scopes are described in detail in <<shaders-scope, the Shaders
214chapter>>.
215
216
217[[memory-model-atomic-operation]]
218== Atomic Operation
219
220An _atomic operation_ on the device is any SPIR-V operation whose name
221begins with code:OpAtomic.
222An atomic operation on the host is any operation performed with an
223std::atomic typed object.
224
225Each atomic operation has a memory <<memory-model-scope,scope>> and a
226<<memory-model-memory-semantics,semantics>>.
227Informally, the scope determines which other agents it is atomic with
228respect to, and the <<memory-model-memory-semantics,semantics>> constrains
229its ordering against other memory accesses.
230Device atomic operations have explicit scopes and semantics.
231Each host atomic operation implicitly uses the code:CrossDevice scope, and
232uses a memory semantics equivalent to a C++ std::memory_order value of
233relaxed, acquire, release, acq_rel, or seq_cst.
234
235Two atomic operations A and B are _potentially-mutually-ordered_ if and only
236if all of the following are true:
237
238  * They access the same set of memory locations.
239  * They use the same reference.
240  * A is in the instance of B's memory scope.
241  * B is in the instance of A's memory scope.
242  * A and B are not the same operation (irreflexive).
243
244Two atomic operations A and B are _mutually-ordered_ if and only if they are
245potentially-mutually-ordered and any of the following are true:
246
247  * A and B are both device operations.
248  * A and B are both host operations.
249  * A is a device operation, B is a host operation, and the implementation
250    supports concurrent host- and device-atomics.
251
252[NOTE]
253.Note
254====
255If two atomic operations are not mutually-ordered, and if their sets of
256memory locations overlap, then each must: be synchronized against the other
257as if they were non-atomic operations.
258====
259
260
261[[memory-model-scoped-modification-order]]
262== Scoped Modification Order
263
264For a given atomic write A, all atomic writes that are mutually-ordered with
265A occur in an order known as A's _scoped modification order_.
266A's scoped modification order relates no other operations.
267
268[NOTE]
269.Note
270====
271Invocations outside the instance of A's memory scope may: observe the values
272at A's set of memory locations becoming visible to it in an order that
273disagrees with the scoped modification order.
274====
275
276[NOTE]
277.Note
278====
279It is valid to have non-atomic operations or atomics in a different scope
280instance to the same set of memory locations, as long as they are
281synchronized against each other as if they were non-atomic (if they are not,
282it is treated as a <<memory-model-access-data-race,data race>>).
283That means this definition of A's scoped modification order could include
284atomic operations that occur much later, after intervening non-atomics.
285That is a bit non-intuitive, but it helps to keep this definition simple and
286non-circular.
287====
288
289
290[[memory-model-memory-semantics]]
291== Memory Semantics
292
293Non-atomic memory operations, by default, may: be observed by one agent in a
294different order than they were written by another agent.
295
296Atomics and some synchronization operations include _memory semantics_,
297which are flags that constrain the order in which other memory accesses
298(including non-atomic memory accesses and
299<<memory-model-availability-visibility,availability and visibility
300operations>>) performed by the same agent can: be observed by other agents,
301or can: observe accesses by other agents.
302
303Device instructions that include semantics are code:OpAtomic*,
304code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier.
305Host instructions that include semantics are some std::atomic methods and
306memory fences.
307
308SPIR-V supports the following memory semantics:
309
310  * Relaxed: No constraints on order of other memory accesses.
311  * Acquire: A memory read with this semantic performs an _acquire
312    operation_.
313    A memory barrier with this semantic is an _acquire barrier_.
314  * Release: A memory write with this semantic performs a _release
315    operation_.
316    A memory barrier with this semantic is a _release barrier_.
317  * AcquireRelease: A memory read-modify-write operation with this semantic
318    performs both an acquire operation and a release operation, and inherits
319    the limitations on ordering from both of those operations.
320    A memory barrier with this semantic is both a release and acquire
321    barrier.
322
323[NOTE]
324.Note
325====
326SPIR-V does not support "`consume`" semantics on the device.
327====
328
329The memory semantics operand also includes _storage class semantics_ which
330indicate which storage classes are constrained by the synchronization.
331SPIR-V storage class semantics include:
332
333  * UniformMemory
334  * WorkgroupMemory
335  * ImageMemory
336  * OutputMemory
337
338Each SPIR-V memory operation accesses a single storage class.
339Semantics in synchronization operations can include a combination of storage
340classes.
341
342The UniformMemory storage class semantic applies to accesses to memory in
343the
344ifdef::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
345PhysicalStorageBuffer,
346endif::VK_VERSION_1_2,VK_EXT_buffer_device_address,VK_KHR_buffer_device_address[]
347ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
348code:ShaderRecordBufferKHR,
349endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
350Uniform and StorageBuffer storage classes.
351The WorkgroupMemory storage class semantic applies to accesses to memory in
352the Workgroup storage class.
353The ImageMemory storage class semantic applies to accesses to memory in the
354Image storage class.
355The OutputMemory storage class semantic applies to accesses to memory in the
356Output storage class.
357
358[NOTE]
359.Note
360====
361Informally, these constraints limit how memory operations can be reordered,
362and these limits apply not only to the order of accesses as performed in the
363agent that executes the instruction, but also to the order the effects of
364writes become visible to all other agents within the same instance of the
365instruction's memory scope.
366====
367
368[NOTE]
369.Note
370====
371Release and acquire operations in different threads can: act as
372synchronization operations, to guarantee that writes that happened before
373the release are visible after the acquire.
374(This is not a formal definition, just an Informative forward reference.)
375====
376
377[NOTE]
378.Note
379====
380The OutputMemory storage class semantic is only useful in tessellation
381control shaders, which is the only execution model where output variables
382are shared between invocations.
383====
384
385The memory semantics operand can: also include availability and visibility
386flags, which apply availability and visibility operations as described in
387<<memory-model-availability-visibility,availability and visibility>>.
388The availability/visibility flags are:
389
390  * MakeAvailable: Semantics must: be Release or AcquireRelease.
391    Performs an availability operation before the release operation or
392    barrier.
393  * MakeVisible: Semantics must: be Acquire or AcquireRelease.
394    Performs a visibility operation after the acquire operation or barrier.
395
396The specifics of these operations are defined in
397<<memory-model-availability-visibility-semantics,Availability and Visibility
398Semantics>>.
399
400Host atomic operations may: support a different list of memory semantics and
401synchronization operations, depending on the host architecture and source
402language.
403
404
405[[memory-model-release-sequence]]
406== Release Sequence
407
408After an atomic operation A performs a release operation on a set of memory
409locations M, the _release sequence headed by A_ is the longest continuous
410subsequence of A's scoped modification order that consists of:
411
412  * the atomic operation A as its first element
413  * atomic read-modify-write operations on M by any agent
414
415[NOTE]
416.Note
417====
418The atomics in the last bullet must: be mutually-ordered with A by virtue of
419being in A's scoped modification order.
420====
421
422[NOTE]
423.Note
424====
425This intentionally omits "`atomic writes to M performed by the same agent
426that performed A`", which is present in the corresponding C++ definition.
427====
428
429
430[[memory-model-synchronizes-with]]
431== Synchronizes-With
432
433_Synchronizes-with_ is a relation between operations, where each operation
434is either an atomic operation or a memory barrier (aka fence on the host).
435
436If A and B are atomic operations, then A synchronizes-with B if and only if
437all of the following are true:
438
439  * A performs a release operation
440  * B performs an acquire operation
441  * A and B are mutually-ordered
442  * B reads a value written by A or by an operation in the release sequence
443    headed by A
444
445code:OpControlBarrier, code:OpMemoryBarrier, and code:OpMemoryNamedBarrier
446are _memory barrier_ instructions in SPIR-V.
447
448If A is a release barrier and B is an atomic operation that performs an
449acquire operation, then A synchronizes-with B if and only if all of the
450following are true:
451
452  * there exists an atomic write X (with any memory semantics)
453  * A is program-ordered before X
454  * X and B are mutually-ordered
455  * B reads a value written by X or by an operation in the release sequence
456    headed by X
457  ** If X is relaxed, it is still considered to head a hypothetical release
458     sequence for this rule
459  * A and B are in the instance of each other's memory scopes
460  * X's storage class is in A's semantics.
461
462If A is an atomic operation that performs a release operation and B is an
463acquire barrier, then A synchronizes-with B if and only if all of the
464following are true:
465
466  * there exists an atomic read X (with any memory semantics)
467  * X is program-ordered before B
468  * X and A are mutually-ordered
469  * X reads a value written by A or by an operation in the release sequence
470    headed by A
471  * A and B are in the instance of each other's memory scopes
472  * X's storage class is in B's semantics.
473
474If A is a release barrier and B is an acquire barrier, then A
475synchronizes-with B if all of the following are true:
476
477  * there exists an atomic write X (with any memory semantics)
478  * A is program-ordered before X
479  * there exists an atomic read Y (with any memory semantics)
480  * Y is program-ordered before B
481  * X and Y are mutually-ordered
482  * Y reads the value written by X or by an operation in the release
483    sequence headed by X
484  ** If X is relaxed, it is still considered to head a hypothetical release
485     sequence for this rule
486  * A and B are in the instance of each other's memory scopes
487  * X's and Y's storage class is in A's and B's semantics.
488  ** NOTE: X and Y must have the same storage class, because they are
489     mutually ordered.
490
491If A is a release barrier, B is an acquire barrier, and C is a control
492barrier (where A can: equal C, and B can: equal C), then A synchronizes-with
493B if all of the following are true:
494
495  * A is program-ordered before (or equals) C
496  * C is program-ordered before (or equals) B
497  * A and B are in the instance of each other's memory scopes
498  * A and B are in the instance of C's execution scope
499
500[NOTE]
501.Note
502====
503This is similar to the barrier-barrier synchronization above, but with a
504control barrier filling the role of the relaxed atomics.
505====
506
507ifdef::VK_EXT_fragment_shader_interlock[]
508
509Let F be an ordering of fragment shader invocations, such that invocation
510F~1~ is ordered before invocation F~2~ if and only if F~1~ and F~2~ overlap
511as described in <<shaders-scope-fragment-interlock,Fragment Shader
512Interlock>> and F~1~ executes the interlocked code before F~2~.
513
514If A is an code:OpEndInvocationInterlockEXT instruction and B is an
515code:OpBeginInvocationInterlockEXT instruction, then A synchronizes-with B
516if the agent that executes A is ordered before the agent that executes B in
517F. A and B are both considered to have code:FragmentInterlock memory scope
518and semantics of UniformMemory and ImageMemory, and A is considered to have
519Release semantics and B is considered to have Acquire semantics.
520
521[NOTE]
522.Note
523====
524code:OpBeginInvocationInterlockEXT and code:OpBeginInvocationInterlockEXT do
525not perform implicit availability or visibility operations.
526Usually, shaders using fragment shader interlock will declare the relevant
527resources as `coherent` to get implicit
528<<memory-model-instruction-av-vis,per-instruction availability and
529visibility operations>>.
530====
531
532endif::VK_EXT_fragment_shader_interlock[]
533
534ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
535If A is a release barrier and B is an acquire barrier, then A
536synchronizes-with B if all of the following are true:
537
538  * A is shader-call-ordered-before B
539  * A and B are in the instance of each other's memory scopes
540
541endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
542
543No other release and acquire barriers synchronize-with each other.
544
545
546[[memory-model-system-synchronizes-with]]
547== System-Synchronizes-With
548
549_System-synchronizes-with_ is a relation between arbitrary operations on the
550device or host.
551Certain operations system-synchronize-with each other, which informally
552means the first operation occurs before the second and that the
553synchronization is performed without using application-visible memory
554accesses.
555
556If there is an <<synchronization-dependencies-execution,execution
557dependency>> between two operations A and B, then the operation in the first
558synchronization scope system-synchronizes-with the operation in the second
559synchronization scope.
560
561[NOTE]
562.Note
563====
564This covers all Vulkan synchronization primitives, including device
565operations executing before a synchronization primitive is signaled, wait
566operations happening before subsequent device operations, signal operations
567happening before host operations that wait on them, and host operations
568happening before flink:vkQueueSubmit.
569The list is spread throughout the synchronization chapter, and is not
570repeated here.
571====
572
573System-synchronizes-with implicitly includes all storage class semantics and
574has code:CrossDevice scope.
575
576If A system-synchronizes-with B, we also say A is
577_system-synchronized-before_ B and B is _system-synchronized-after_ A.
578
579
580[[memory-model-non-private]]
581== Private vs. Non-Private
582
583By default, non-atomic memory operations are treated as _private_, meaning
584such a memory operation is not intended to be used for communication with
585other agents.
586Memory operations with the NonPrivatePointer/NonPrivateTexel bit set are
587treated as _non-private_, and are intended to be used for communication with
588other agents.
589
590More precisely, for private memory operations to be
591<<memory-model-location-ordered,Location-Ordered>> between distinct agents
592requires using system-synchronizes-with rather than shader-based
593synchronization.
594Private memory operations still obey program-order.
595
596Atomic operations are always considered non-private.
597
598
599[[memory-model-inter-thread-happens-before]]
600== Inter-Thread-Happens-Before
601
602Let SC be a non-empty set of storage class semantics.
603Then (using template syntax) operation A _inter-thread-happens-before_<SC>
604operation B if and only if any of the following is true:
605
606  * A system-synchronizes-with B
607  * A synchronizes-with B, and both A and B have all of SC in their
608    semantics
609  * A is an operation on memory in a storage class in SC or that has all of
610    SC in its semantics, B is a release barrier or release atomic with all
611    of SC in its semantics, and A is program-ordered before B
612  * A is an acquire barrier or acquire atomic with all of SC in its
613    semantics, B is an operation on memory in a storage class in SC or that
614    has all of SC in its semantics, and A is program-ordered before B
615  * A and B are both host operations and A inter-thread-happens-before B as
616    defined in the host language specification
617  * A inter-thread-happens-before<SC> some X and X
618    inter-thread-happens-before<SC> B
619
620
621[[memory-model-happens-before]]
622== Happens-Before
623
624Operation A _happens-before_ operation B if and only if any of the following
625is true:
626
627  * A is program-ordered before B
628  * A inter-thread-happens-before<SC> B for some set of storage classes SC
629
630_Happens-after_ is defined similarly.
631
632[NOTE]
633.Note
634====
635Unlike C++, happens-before is not always sufficient for a write to be
636visible to a read.
637Additional <<memory-model-availability-visibility,availability and
638visibility>> operations may: be required for writes to be
639<<memory-model-visible-to,visible-to>> other memory accesses.
640====
641
642[NOTE]
643.Note
644====
645Happens-before is not transitive, but each of program-order and
646inter-thread-happens-before<SC> are transitive.
647These can be thought of as covering the "`single-threaded`" case and the
648"`multi-threaded`" case, and it is not necessary (and not valid) to form
649chains between the two.
650====
651
652
653[[memory-model-availability-visibility]]
654== Availability and Visibility
655
656_Availability_ and _visibility_ are states of a write operation, which
657(informally) track how far the write has permeated the system, i.e. which
658agents and references are able to observe the write.
659Availability state is per _memory domain_.
660Visibility state is per (agent,reference) pair.
661Availability and visibility states are per-memory location for each write.
662
663Memory domains are named according to the agents whose memory accesses use
664the domain.
665Domains used by shader invocations are organized hierarchically into
666multiple smaller memory domains which correspond to the different
667<<shaders-scope, scopes>>.
668Each memory domain is considered the _dual_ of a scope, and vice versa.
669The memory domains defined in Vulkan include:
670
671  * _host_ - accessible by host agents
672  * _device_ - accessible by all device agents for a particular device
673  * _shader_ - accessible by shader agents for a particular device,
674    corresponding to the code:Device scope
675  * _queue family instance_ - accessible by shader agents in a single queue
676    family, corresponding to the code:QueueFamily scope.
677ifdef::VK_EXT_fragment_shader_interlock[]
678  * _fragment interlock instance_ - accessible by fragment shader agents
679    that <<shaders-scope-fragment-interlock,overlap>>, corresponding to the
680    code:FragmentInterlock scope.
681endif::VK_EXT_fragment_shader_interlock[]
682ifdef::VK_KHR_ray_tracing_pipeline[]
683  * _shader call instance_ - accessible by shader agents that are
684    <<shader-call-related,shader-call-related>>, corresponding to the
685    code:ShaderCallKHR scope.
686endif::VK_KHR_ray_tracing_pipeline[]
687  * _workgroup instance_ - accessible by shader agents in the same
688    workgroup, corresponding to the code:Workgroup scope.
689  * _subgroup instance_ - accessible by shader agents in the same subgroup,
690    corresponding to the code:Subgroup scope.
691
692The memory domains are nested in the order listed above,
693ifdef::VK_KHR_ray_tracing_pipeline[]
694except for shader call instance domain,
695endif::VK_KHR_ray_tracing_pipeline[]
696with memory domains later in the list nested in the domains earlier in the
697list.
698ifdef::VK_KHR_ray_tracing_pipeline[]
699The shader call instance domain is at an implementation-dependent location
700in the list, and is nested according to that location.
701The shader call instance domain is not broader than the queue family
702instance domain.
703endif::VK_KHR_ray_tracing_pipeline[]
704
705[NOTE]
706.Note
707====
708Memory domains do not correspond to storage classes or device-local and
709host-local slink:VkDeviceMemory allocations, rather they indicate whether a
710write can be made visible only to agents in the same subgroup, same
711workgroup,
712ifdef::VK_EXT_fragment_shader_interlock[]
713overlapping fragment shader invocation,
714endif::VK_EXT_fragment_shader_interlock[]
715ifdef::VK_KHR_ray_tracing_pipeline[]
716shader-call-related ray tracing invocation,
717endif::VK_KHR_ray_tracing_pipeline[]
718in any shader invocation, or anywhere on the device, or host.
719The shader, queue family instance,
720ifdef::VK_EXT_fragment_shader_interlock[]
721fragment interlock instance,
722endif::VK_EXT_fragment_shader_interlock[]
723ifdef::VK_KHR_ray_tracing_pipeline[]
724shader call instance,
725endif::VK_KHR_ray_tracing_pipeline[]
726workgroup instance, and subgroup instance domains are only used for
727shader-based availability/visibility operations, in other cases writes can
728be made available from/visible to the shader via the device domain.
729====
730
731_Availability operations_, _visibility operations_, and _memory domain
732operations_ alter the state of the write operations that happen-before them,
733and which are included in their _source scope_ to be available or visible to
734their _destination scope_.
735
736  * For an availability operation, the source scope is a set of
737    (agent,reference,memory location) tuples, and the destination scope is a
738    set of memory domains.
739  * For a memory domain operation, the source scope is a memory domain and
740    the destination scope is a memory domain.
741  * For a visibility operation, the source scope is a set of memory domains
742    and the destination scope is a set of (agent,reference,memory location)
743    tuples.
744
745How the scopes are determined depends on the specific operation.
746Availability and memory domain operations expand the set of memory domains
747to which the write is available.
748Visibility operations expand the set of (agent,reference,memory location)
749tuples to which the write is visible.
750
751Recall that availability and visibility states are per-memory location, and
752let W be a write operation to one or more locations performed by agent A via
753reference R. Let L be one of the locations written.
754(W,L) (the write W to L), is initially not available to any memory domain
755and only visible to (A,R,L).
756An availability operation AV that happens-after W and that includes (A,R,L)
757in its source scope makes (W,L) _available_ to the memory domains in its
758destination scope.
759
760A memory domain operation DOM that happens-after AV and for which (W,L) is
761available in the source scope makes (W,L) available in the destination
762memory domain.
763
764A visibility operation VIS that happens-after AV (or DOM) and for which
765(W,L) is available in any domain in the source scope makes (W,L) _visible_
766to all (agent,reference,L) tuples included in its destination scope.
767
768If write W~2~ happens-after W, and their sets of memory locations overlap,
769then W will not be available/visible to all agents/references for those
770memory locations that overlap (and future AV/DOM/VIS ops cannot revive W's
771write to those locations).
772
773Availability, memory domain, and visibility operations are treated like
774other non-atomic memory accesses for the purpose of
775<<memory-model-memory-semantics,memory semantics>>, meaning they can be
776ordered by release-acquire sequences or memory barriers.
777
778An _availability chain_ is a sequence of availability operations to
779increasingly broad memory domains, where element N+1 of the chain is
780performed in the dual scope instance of the destination memory domain of
781element N and element N happens-before element N+1.
782An example is an availability operation with destination scope of the
783workgroup instance domain that happens-before an availability operation to
784the shader domain performed by an invocation in the same workgroup.
785An availability chain AVC that happens-after W and that includes (A,R,L) in
786the source scope makes (W,L) _available_ to the memory domains in its final
787destination scope.
788An availability chain with a single element is just the availability
789operation.
790
791Similarly, a _visibility chain_ is a sequence of visibility operations from
792increasingly narrow memory domains, where element N of the chain is
793performed in the dual scope instance of the source memory domain of element
794N+1 and element N happens-before element N+1.
795An example is a visibility operation with source scope of the shader domain
796that happens-before a visibility operation with source scope of the
797workgroup instance domain performed by an invocation in the same workgroup.
798A visibility chain VISC that happens-after AVC (or DOM) and for which (W,L)
799is available in any domain in the source scope makes (W,L) _visible_ to all
800(agent,reference,L) tuples included in its final destination scope.
801A visibility chain with a single element is just the visibility operation.
802
803
804[[memory-model-vulkan-availability-visibility]]
805== Availability, Visibility, and Domain Operations
806
807The following operations generate availability, visibility, and domain
808operations.
809When multiple availability/visibility/domain operations are described, they
810are system-synchronized-with each other in the order listed.
811
812An operation that performs a <<synchronization-dependencies-memory,memory
813dependency>> generates:
814
815  * If the source access mask includes ename:VK_ACCESS_HOST_WRITE_BIT, then
816    the dependency includes a memory domain operation from host domain to
817    device domain.
818  * An availability operation with source scope of all writes in the first
819    <<synchronization-dependencies-access-scopes,access scope>> of the
820    dependency and a destination scope of the device domain.
821  * A visibility operation with source scope of the device domain and
822    destination scope of the second access scope of the dependency.
823  * If the destination access mask includes ename:VK_ACCESS_HOST_READ_BIT or
824    ename:VK_ACCESS_HOST_WRITE_BIT, then the dependency includes a memory
825    domain operation from device domain to host domain.
826
827flink:vkFlushMappedMemoryRanges performs an availability operation, with a
828source scope of (agents,references) = (all host threads, all mapped memory
829ranges passed to the command), and destination scope of the host domain.
830
831flink:vkInvalidateMappedMemoryRanges performs a visibility operation, with a
832source scope of the host domain and a destination scope of
833(agents,references) = (all host threads, all mapped memory ranges passed to
834the command).
835
836flink:vkQueueSubmit performs a memory domain operation from host to device,
837and a visibility operation with source scope of the device domain and
838destination scope of all agents and references on the device.
839
840
841[[memory-model-availability-visibility-semantics]]
842== Availability and Visibility Semantics
843
844A memory barrier or atomic operation via agent A that includes MakeAvailable
845in its semantics performs an availability operation whose source scope
846includes agent A and all references in the storage classes in that
847instruction's storage class semantics, and all memory locations, and whose
848destination scope is a set of memory domains selected as specified below.
849The implicit availability operation is program-ordered between the barrier
850or atomic and all other operations program-ordered before the barrier or
851atomic.
852
853A memory barrier or atomic operation via agent A that includes MakeVisible
854in its semantics performs a visibility operation whose source scope is a set
855of memory domains selected as specified below, and whose destination scope
856includes agent A and all references in the storage classes in that
857instruction's storage class semantics, and all memory locations.
858The implicit visibility operation is program-ordered between the barrier or
859atomic and all other operations program-ordered after the barrier or atomic.
860
861The memory domains are selected based on the memory scope of the instruction
862as follows:
863
864  * code:Device scope uses the shader domain
865  * code:QueueFamily scope uses the queue family instance domain
866ifdef::VK_EXT_fragment_shader_interlock[]
867  * code:FragmentInterlock scope uses the fragment interlock instance domain
868endif::VK_EXT_fragment_shader_interlock[]
869ifdef::VK_KHR_ray_tracing_pipeline[]
870  * code:ShaderCallKHR scope uses the shader call instance domain
871endif::VK_KHR_ray_tracing_pipeline[]
872  * code:Workgroup scope uses the workgroup instance domain
873  * code:Subgroup uses the subgroup instance domain
874  * code:Invocation perform no availability/visibility operations.
875
876When an availability operation performed by an agent A includes a memory
877domain D in its destination scope, where D corresponds to scope instance S,
878it also includes the memory domains that correspond to each smaller scope
879instance S' that is a subset of S and that includes A. Similarly for
880visibility operations.
881
882
883[[memory-model-instruction-av-vis]]
884== Per-Instruction Availability and Visibility Semantics
885
886A memory write instruction that includes MakePointerAvailable, or an image
887write instruction that includes MakeTexelAvailable, performs an availability
888operation whose source scope includes the agent and reference used to
889perform the write and the memory locations written by the instruction, and
890whose destination scope is a set of memory domains selected by the Scope
891operand specified in <<memory-model-availability-visibility-semantics,
892Availability and Visibility Semantics>>.
893The implicit availability operation is program-ordered between the write and
894all other operations program-ordered after the write.
895
896A memory read instruction that includes MakePointerVisible, or an image read
897instruction that includes MakeTexelVisible, performs a visibility operation
898whose source scope is a set of memory domains selected by the Scope operand
899as specified in <<memory-model-availability-visibility-semantics,
900Availability and Visibility Semantics>>, and whose destination scope
901includes the agent and reference used to perform the read and the memory
902locations read by the instruction.
903The implicit visibility operation is program-ordered between read and all
904other operations program-ordered before the read.
905
906[NOTE]
907.Note
908====
909Although reads with per-instruction visibility only perform visibility ops
910from the shader or
911ifdef::VK_EXT_fragment_shader_interlock[]
912fragment interlock instance or
913endif::VK_EXT_fragment_shader_interlock[]
914ifdef::VK_KHR_ray_tracing_pipeline[]
915shader call instance or
916endif::VK_KHR_ray_tracing_pipeline[]
917workgroup instance or subgroup instance domain, they will also see writes
918that were made visible via the device domain, i.e. those writes previously
919performed by non-shader agents and made visible via API commands.
920====
921
922[NOTE]
923.Note
924====
925It is expected that all invocations in a subgroup execute on the same
926processor with the same path to memory, and thus availability and visibility
927operations with subgroup scope can be expected to be "`free`".
928====
929
930
931[[memory-model-location-ordered]]
932== Location-Ordered
933
934Let X and Y be memory accesses to overlapping sets of memory locations M,
935where X != Y. Let (A~X~,R~X~) be the agent and reference used for X, and
936(A~Y~,R~Y~) be the agent and reference used for Y. For now, let "`->`"
937denote happens-before and "`->^rcpo^`" denote the reflexive closure of
938program-ordered before.
939
940If D~1~ and D~2~ are different memory domains, then let DOM(D~1~,D~2~) be a
941memory domain operation from D~1~ to D~2~.
942Otherwise, let DOM(D,D) be a placeholder such that X->DOM(D,D)->Y if and
943only if X->Y.
944
945X is _location-ordered_ before Y for a location L in M if and only if any of
946the following is true:
947
948  * A~X~ == A~Y~ and R~X~ == R~Y~ and X->Y
949  ** NOTE: this case means no availability/visibility ops are required when
950     it is the same (agent,reference).
951
952  * X is a read, both X and Y are non-private, and X->Y
953  * X is a read, and X (transitively) system-synchronizes with Y
954
955  * If R~X~ == R~Y~ and A~X~ and A~Y~ access a common memory domain D (e.g.
956    are in the same workgroup instance if D is the workgroup instance
957    domain), and both X and Y are non-private:
958  ** X is a write, Y is a write, AVC(A~X~,R~X~,D,L) is an availability chain
959     making (X,L) available to domain D, and X->^rcpo^AVC(A~X~,R~X~,D,L)->Y
960  ** X is a write, Y is a read, AVC(A~X~,R~X~,D,L) is an availability chain
961     making (X,L) available to domain D, VISC(A~Y~,R~Y~,D,L) is a visibility
962     chain making writes to L available in domain D visible to Y, and
963     X->^rcpo^AVC(A~X~,R~X~,D,L)->VISC(A~Y~,R~Y~,D,L)->^rcpo^Y
964  ** If
965     slink:VkPhysicalDeviceVulkanMemoryModelFeatures::pname:vulkanMemoryModelAvailabilityVisibilityChains
966     is ename:VK_FALSE, then AVC and VISC must: each only have a single
967     element in the chain, in each sub-bullet above.
968
969  * Let D~X~ and D~Y~ each be either the device domain or the host domain,
970    depending on whether A~X~ and A~Y~ execute on the device or host:
971  ** X is a write and Y is a write, and
972     X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->Y
973  ** X is a write and Y is a read, and
974     X->AV(A~X~,R~X~,D~X~,L)->DOM(D~X~,D~Y~)->VIS(A~Y~,R~Y~,D~Y~,L)->Y
975
976[NOTE]
977.Note
978====
979The final bullet (synchronization through device/host domain) requires
980API-level synchronization operations, since the device/host domains are not
981accessible via shader instructions.
982And "`device domain`" is not to be confused with "`device scope`", which
983synchronizes through the "`shader domain`".
984====
985
986
987[[memory-model-access-data-race]]
988== Data Race
989
990Let X and Y be operations that access overlapping sets of memory locations
991M, where X != Y, and at least one of X and Y is a write, and X and Y are not
992mutually-ordered atomic operations.
993If there does not exist a location-ordered relation between X and Y for each
994location in M, then there is a _data race_.
995
996Applications must: ensure that no data races occur during the execution of
997their application.
998
999[NOTE]
1000.Note
1001====
1002Data races can only occur due to instructions that are actually executed.
1003For example, an instruction skipped due to control flow must not contribute
1004to a data race.
1005====
1006
1007
1008[[memory-model-visible-to]]
1009== Visible-To
1010
1011Let X be a write and Y be a read whose sets of memory locations overlap, and
1012let M be the set of memory locations that overlap.
1013Let M~2~ be a non-empty subset of M. Then X is _visible-to_ Y for memory
1014locations M~2~ if and only if all of the following are true:
1015
1016  * X is location-ordered before Y for each location L in M~2~.
1017  * There does not exist another write Z to any location L in M~2~ such that
1018    X is location-ordered before Z for location L and Z is location-ordered
1019    before Y for location L.
1020
1021If X is visible-to Y, then Y reads the value written by X for locations
1022M~2~.
1023
1024[NOTE]
1025.Note
1026====
1027It is possible for there to be a write between X and Y that overwrites a
1028subset of the memory locations, but the remaining memory locations (M~2~)
1029will still be visible-to Y.
1030====
1031
1032
1033[[memory-model-acyclicity]]
1034== Acyclicity
1035
1036_Reads-from_ is a relation between operations, where the first operation is
1037a write, the second operation is a read, and the second operation reads the
1038value written by the first operation.
1039_From-reads_ is a relation between operations, where the first operation is
1040a read, the second operation is a write, and the first operation reads a
1041value written earlier than the second operation in the second operation's
1042scoped modification order (or the first operation reads from the initial
1043value, and the second operation is any write to the same locations).
1044
1045Then the implementation must: guarantee that no cycles exist in the union of
1046the following relations:
1047
1048  * location-ordered
1049  * scoped modification order (over all atomic writes)
1050  * reads-from
1051  * from-reads
1052
1053[NOTE]
1054.Note
1055====
1056This is a "`consistency`" axiom, which informally guarantees that sequences
1057of operations cannot violate causality.
1058====
1059
1060
1061[[memory-model-scoped-modification-order-coherence]]
1062=== Scoped Modification Order Coherence
1063
1064Let A and B be mutually-ordered atomic operations, where A is
1065location-ordered before B. Then the following rules are a consequence of
1066acyclicity:
1067
1068  * If A and B are both reads and A does not read the initial value, then
1069    the write that A takes its value from must: be earlier in its own scoped
1070    modification order than (or the same as) the write that B takes its
1071    value from (no cycles between location-order, reads-from, and
1072    from-reads).
1073  * If A is a read and B is a write and A does not read the initial value,
1074    then A must: take its value from a write earlier than B in B's scoped
1075    modification order (no cycles between location-order, scope modification
1076    order, and reads-from).
1077  * If A is a write and B is a read, then B must: take its value from A or a
1078    write later than A in A's scoped modification order (no cycles between
1079    location-order, scoped modification order, and from-reads).
1080  * If A and B are both writes, then A must: be earlier than B in A's scoped
1081    modification order (no cycles between location-order and scoped
1082    modification order).
1083  * If A is a write and B is a read-modify-write and B reads the value
1084    written by A, then B comes immediately after A in A's scoped
1085    modification order (no cycles between scoped modification order and
1086    from-reads).
1087
1088
1089[[memory-model-shader-io]]
1090== Shader I/O
1091
1092If a shader invocation A in a shader stage other than code:Vertex performs a
1093memory read operation X from an object in storage class
1094ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1095code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR,
1096code:HitAttributeKHR, code:IncomingRayPayloadKHR, or
1097endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1098code:Input, then X is system-synchronized-after all writes to the
1099corresponding
1100ifdef::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1101code:CallableDataKHR, code:IncomingCallableDataKHR, code:RayPayloadKHR,
1102code:HitAttributeKHR, code:IncomingRayPayloadKHR, or
1103endif::VK_KHR_ray_tracing_pipeline,VK_NV_ray_tracing[]
1104code:Output storage variable(s) in the shader invocation(s) that contribute
1105to generating invocation A, and those writes are all visible-to X.
1106
1107[NOTE]
1108.Note
1109====
1110It is not necessary for the upstream shader invocations to have completed
1111execution, they only need to have generated the output that is being read.
1112====
1113
1114
1115[[memory-model-deallocation]]
1116== Deallocation
1117
1118ifndef::VKSC_VERSION_1_0[]
1119
1120A call to flink:vkFreeMemory must: happen-after all memory operations on all
1121memory locations in that slink:VkDeviceMemory object.
1122
1123[NOTE]
1124.Note
1125====
1126Normally, device memory operations in a given queue are synchronized with
1127flink:vkFreeMemory by having a host thread wait on a fence signaled by that
1128queue, and the wait happens-before the call to flink:vkFreeMemory on the
1129host.
1130====
1131
1132endif::VKSC_VERSION_1_0[]
1133
1134The deallocation of SPIR-V variables is managed by the system and
1135happens-after all operations on those variables.
1136
1137
1138[[memory-model-informative-descriptions]]
1139== Descriptions (Informative)
1140
1141This subsection offers more easily understandable consequences of the memory
1142model for app/compiler developers.
1143
1144Let SC be the storage class(es) specified by a release or acquire operation
1145or barrier.
1146
1147  * An atomic write with release semantics must not be reordered against any
1148    read or write to SC that is program-ordered before it (regardless of the
1149    storage class the atomic is in).
1150
1151  * An atomic read with acquire semantics must not be reordered against any
1152    read or write to SC that is program-ordered after it (regardless of the
1153    storage class the atomic is in).
1154
1155  * Any write to SC program-ordered after a release barrier must not be
1156    reordered against any read or write to SC program-ordered before that
1157    barrier.
1158
1159  * Any read from SC program-ordered before an acquire barrier must not be
1160    reordered against any read or write to SC program-ordered after the
1161    barrier.
1162
1163A control barrier (even if it has no memory semantics) must not be reordered
1164against any memory barriers.
1165
1166This memory model allows memory accesses with and without availability and
1167visibility operations, as well as atomic operations, all to be performed on
1168the same memory location.
1169This is critical to allow it to reason about memory that is reused in
1170multiple ways, e.g. across the lifetime of different shader invocations or
1171draw calls.
1172While GLSL (and legacy SPIR-V) applies the "`coherent`" decoration to
1173variables (for historical reasons), this model treats each memory access
1174instruction as having optional implicit availability/visibility operations.
1175GLSL to SPIR-V compilers should map all (non-atomic) operations on a
1176coherent variable to Make{Pointer,Texel}\{Available}\{Visible} flags in this
1177model.
1178
1179Atomic operations implicitly have availability/visibility operations, and
1180the scope of those operations is taken from the atomic operation's scope.
1181
1182
1183[[memory-model-tessellation-output-ordering]]
1184== Tessellation Output Ordering
1185
1186For SPIR-V that uses the Vulkan Memory Model, the code:OutputMemory storage
1187class is used to synchronize accesses to tessellation control output
1188variables.
1189For legacy SPIR-V that does not enable the Vulkan Memory Model via
1190code:OpMemoryModel, tessellation outputs can be ordered using a control
1191barrier with no particular memory scope or semantics, as defined below.
1192
1193Let X and Y be memory operations performed by shader invocations A~X~ and
1194A~Y~.
1195Operation X is _tessellation-output-ordered_ before operation Y if and only
1196if all of the following are true:
1197
1198  * There is a dynamic instance of an code:OpControlBarrier instruction C
1199    such that X is program-ordered before C in A~X~ and C is program-ordered
1200    before Y in A~Y~.
1201  * A~X~ and A~Y~ are in the same instance of C's execution scope.
1202
1203If shader invocations A~X~ and A~Y~ in the code:TessellationControl
1204execution model execute memory operations X and Y, respectively, on the
1205code:Output storage class, and X is tessellation-output-ordered before Y
1206with a scope of code:Workgroup, then X is location-ordered before Y, and if
1207X is a write and Y is a read then X is visible-to Y.
1208
1209
1210ifdef::VK_NV_cooperative_matrix[]
1211[[memory-model-cooperative-matrix]]
1212== Cooperative Matrix Memory Access
1213
1214For each dynamic instance of a cooperative matrix load or store instruction
1215(code:OpCooperativeMatrixLoadNV or code:OpCooperativeMatrixStoreNV), a
1216single implementation-dependent invocation within the instance of the
1217matrix's scope performs a non-atomic load or store (respectively) to each
1218memory location that is defined to be accessed by the instruction.
1219endif::VK_NV_cooperative_matrix[]
1220