1Mechanisms for Coordination Between Garbage Collector and Mutator 2----------------------------------------------------------------- 3 4Most garbage collection work can proceed concurrently with the client or 5mutator Java threads. But in certain places, for example while tracing from 6thread stacks, the garbage collector needs to ensure that Java data processed 7by the collector is consistent and complete. At these points, the mutators 8should not hold references to the heap that are invisible to the garbage 9collector. And they should not be modifying the data that is visible to the 10collector. 11 12Logically, the collector and mutator share a reader-writer lock on the Java 13heap and associated data structures. Mutators hold the lock in reader or shared mode 14while running Java code or touching heap-related data structures. The collector 15holds the lock in writer or exclusive mode while it needs the heap data 16structures to be stable. However, this reader-writer lock has a very customized 17implementation that also provides additional facilities, such as the ability 18to exclude only a single thread, so that we can specifically examine its heap 19references. 20 21In order to ensure consistency of the Java data, the compiler inserts "suspend 22points", sometimes also called "safe points" into the code. These allow a thread 23to respond to external requests. 24 25Whenever a thread is runnable, i.e. whenever a thread logically holds the 26mutator lock in shared mode, it is expected to regularly execute such a suspend 27point, and check for pending requests. They are currently implemented by 28setting a flag in the thread structure[^1], which is then explicitly tested by the 29compiler-generated code. 30 31A thread responds to suspend requests only when it is "runnable", i.e. logically 32running Java code. When it runs native code, or is blocked in a kernel call, it 33logically releases the mutator lock. When the garbage collector needs mutator 34cooperation, and the thread is not runnable, it is assured that the mutator is 35not touching Java data, and hence the collector can safely perform the required 36action itself, on the mutator thread's behalf. 37 38Normally, when a thread makes a JNI call, it is not considered runnable while 39executing native code. This makes the transitions to and from running native JNI 40code somewhat expensive (see below). But these transitions are necessary to 41ensure that such code, which does not execute "suspend points", and can thus not 42cooperate with the GC, doesn't delay GC completion. `@FastNative` and 43`@CriticalNative` calls avoid these transitions, instead allowing the thread to 44remain "runnable", at the expense of potentially delaying GC operations for the 45duration of the call. 46 47Although we say that a thread is "suspended" when it is not running Java code, 48it may in fact still be running native code and touching data structures that 49are not considered "Java data". This distinction can be a fine line. For 50example, a Java thread blocked on a Java monitor will normally be "suspended" 51and blocked on a mutex contained in the monitor data structure. But it may wake 52up for reasons beyond ARTs control, which will normally result in touching the 53mutex. The monitor code must be quite careful to ensure that this does not cause 54problems, especially if the ART runtime was shut down in the interim and the 55monitor data structure has been reclaimed. 56 57Calls to change thread state 58---------------------------- 59 60When a thread changes between running Java and native code, it has to 61correspondingly change its state between "runnable" and one of several 62other states, all of which are considered to be "suspended" for our purposes. 63When a Java thread starts to execute native code, and may thus not respond 64promptly to suspend requests, it will normally create an object of type 65`ScopedThreadSuspension`. `ScopedThreadSuspension`'s constructor changes state to 66the "suspended" state given as an argument, logically releasing the mutator lock 67and promising to no longer touch Java data structures. It also handles any 68pending suspension requests that slid in just before it changed state. 69 70Conversely, `ScopedThreadSuspension`'s destructor waits until the GC has finished 71any actions it is currently performing on the thread's behalf and effectively 72released the mutator exclusive lock, and then returns to runnable state, 73re-acquiring the mutator lock. 74 75Occasionally a thread running native code needs to temporarily again access Java 76data structures, performing the above transitions in the opposite order. 77`ScopedObjectAccess` is a similar RAII object whose constructor and destructor 78perform those transitions in the reverse order from `ScopedThreadSuspension`. 79 80Mutator lock implementation 81--------------------------- 82 83The mutator lock is not implemented as a conventional mutex. But it plays by the 84rules of our normal static thread-safety analysis. Thus a function that is 85expected to be called in runnable state, with the ability to access Java data, 86should be annotated with `REQUIRES_SHARED(Locks::mutator_lock_)`. 87 88There is an explicit `mutator_lock_` object, of type `MutatorMutex`. `MutatorMutex` is 89seemingly a minor refinement of `ReaderWriterMutex`, but it is used entirely 90differently. It is acquired explicitly by clients that need to hold it 91exclusively, and in a small number of cases, it is acquired in shared mode, e.g. 92via `SharedTryLock()`, or by the GC itself. However, more commonly 93`MutatorMutex::TransitionFromSuspendedToRunnable()`, is used to logically acquire 94the mutator mutex, e.g. as part of `ScopedObjectAccess` construction. 95 96`TransitionFromSuspendedToRunnable()` does not physically acquire the 97`ReaderWriterMutex` in shared mode. Thus any thread acquiring the lock in exclusive mode 98must, in addition, explicitly arrange for mutator threads to be suspended via the 99thread suspension mechanism, and then make them runnable again on release. 100 101Logically the mutator lock is held in shared/reader mode if ***either*** the 102underlying reader-writer lock is held in shared mode, ***or*** if a mutator is in 103runnable state. These two ways of holding the mutator mutex are ***not*** 104equivalent: In particular, we rely on the garbage collector never actually 105entering a "runnable" state while active (see below). However, it often runs with 106the explicit mutator mutex in shared mode, thus blocking others from acquiring it 107in exclusive mode. 108 109Suspension and checkpoint API 110----------------------------- 111 112Suspend point checks enable three kinds of communication with mutator threads: 113 114**Checkpoints** 115: Checkpoint requests are used to get a thread to perform an action 116on our behalf. `RequestCheckpoint()` asks a specific thread to execute the closure 117supplied as an argument at its leisure. `RequestSynchronousCheckpoint()` in 118addition waits for the thread to complete running the closure, and handles 119suspended threads by running the closure on their behalf. In addition to these 120functions provided by `Thread`, `ThreadList` provides the `RunCheckpoint()` function 121that runs a checkpoint function on behalf of each thread, either by using 122`RequestCheckpoint()` to run it inside a running thread, or by ensuring that a 123suspended thread stays suspended, and then running the function on its behalf. 124`RunCheckpoint()` does not wait for completion of the function calls triggered by 125the resulting `RequestCheckpoint()` invocations. 126 127**Empty checkpoints** 128: ThreadList provides `RunEmptyCheckpoint()`, which waits until 129all threads have either passed a suspend point, or have been suspended. This 130ensures that no thread is still executing Java code inside the same 131suspend-point-delimited code interval it was executing before the call. For 132example, a read-barrier started before a `RunEmptyCheckpoint()` call will have 133finished before the call returns. 134 135**Thread suspension** 136: ThreadList provides a number of `SuspendThread...()` calls and 137a `SuspendAll()` call to suspend one or all threads until they are resumed by 138`Resume()` or `ResumeAll()`. The `Suspend...` calls guarantee that the target 139thread(s) are suspended (again, only in the sense of not running Java code) 140when the call returns. 141 142Deadlock freedom 143---------------- 144 145It is easy to deadlock while attempting to run checkpoints, or suspending 146threads. In particular, we need to avoid situations in which we cannot suspend 147a thread because it is blocked, directly, or indirectly, on the GC completing 148its task. Deadlocks are avoided as follows: 149 150**Mutator lock ordering** 151The mutator lock participates in the normal ART lock ordering hierarchy, as though it 152were a regular lock. See `base/locks.h` for the hierarchy. In particular, only 153locks at or below level `kPostMutatorTopLockLevel` may be acquired after 154acquiring the mutator lock, e.g. inside the scope of a `ScopedObjectAccess`. 155Similarly only locks at level strictly above `kMutatatorLock` may be held while 156acquiring the mutator lock, e.g. either by starting a `ScopedObjectAccess`, or 157ending a `ScopedThreadSuspension`. 158 159This ensures that code that uses purely mutexes and threads state changes cannot 160deadlock: Since we always wait on a lower-level lock, the holder of the 161lowest-level lock can always progress. An attempt to initiate a checkpoint or to 162suspend another thread must also be treated as an acquisition of the mutator 163lock: A thread that is waiting for a lock before it can respond to the request 164is itself holding the mutator lock, and can only be blocked on lower-level 165locks. And acquisition of those can never depend on acquiring the mutator 166lock. 167 168**Checkpoints** 169Running a checkpoint in a thread requires suspending that thread for the 170duration of the checkpoint, or running the checkpoint on the threads behalf 171while that thread is blocked from executing Java code. In the former case, the 172checkpoint code is run from `CheckSuspend`, which requires the mutator lock, 173so checkpoint code may only acquire mutexes at or below level 174`kPostMutatorTopLockLevel`. But that is not sufficient. 175 176No matter whether the checkpoint is run in the target thread, or on its behalf, 177the target thread is effectively suspended and prevented from running Java code. 178However the target may hold arbitrary Java monitors, which it can no longer 179release. This may also prevent higher level mutexes from getting released. Thus 180checkpoint code should only acquire mutexes at level `kPostMonitorLock` or 181below. 182 183 184**Waiting** 185This becomes much more problematic when we wait for something other than a lock. 186Waiting for something that may depend on the GC, while holding the mutator lock, 187can potentially lead to deadlock, since it will prevent the waiting thread from 188participating in GC checkpoints. Waiting while holding a lower-level lock like 189`thread_list_lock_` is similarly unsafe in general, since a runnable thread may 190not respond to checkpoints until it acquires `thread_list_lock_`. In general, 191waiting for a condition variable while holding an unrelated lock is problematic, 192and these are specific instances of that general problem. 193 194We do currently provide `WaitHoldingLocks`, and it is sometimes used with 195low-level locks held. But such code must somehow ensure that such waits 196eventually terminate without deadlock. 197 198One common use of WaitHoldingLocks is to wait for weak reference processing. 199Special rules apply to avoid deadlocks in this case: Such waits must start after 200weak reference processing is disabled; the GC may not issue further nonempty 201checkpoints or suspend requests until weak reference processing has been 202reenabled, and threads have been notified. Thus the waiting thread's inability 203to respond to nonempty checkpoints and suspend requests cannot directly block 204the GC. Non-GC checkpoint or suspend requests that target a thread waiting on 205reference processing will block until reference processing completes. 206 207Consider a case in which thread W1 waits on reference processing, while holding 208a low-level mutex M. Thread W2 holds the mutator lock and waits on M. We avoid a 209situation in which the GC needs to suspend or checkpoint W2 by briefly stopping 210the world to disable weak reference access. During the stop-the-world phase, W1 211cannot yet be waiting for weak-reference access. Thus there is no danger of 212deadlock while entering this phase. After this phase, there is no need for W2 to 213suspend or execute a nonempty checkpoint. If we replaced the stop-the-world 214phase by a checkpoint, W2 could receive the checkpoint request too late, and be 215unable to respond. 216 217Empty checkpoints can continue to occur during reference processing. Reference 218processing wait loops explicitly handle empty checkpoints, and an empty 219checkpoint request notifies the condition variable used to wait for reference 220processing, after acquiring `reference_processor_lock_`. This means that empty 221checkpoints do not preclude client threads from being in the middle of an 222operation that involves a weak reference access, while nonempty checkpoints do. 223 224**Suspending the GC** 225Under unusual conditions, the GC can run on any thread. This means that when 226thread *A* suspends thread *B* for some other reason, Thread *B* might be 227running the garbage collector and conceivably thus cause it to block. This 228would be very deadlock prone. If Thread *A* allocates while Thread *B* is 229suspended in the GC, and the allocation requires the GC's help to complete, we 230deadlock. 231 232Thus we ensure that the GC, together with anything else that can block GCs, 233cannot be blocked for thread suspension requests. This is accomplished by 234ensuring that it always appears to be in a suspended thread state. Since we 235only check for suspend requests when entering the runnable state, suspend 236requests go unnoticed until the GC completes. It may physically acquire and 237release the actual `mutator_lock_` in either shared or exclusive mode. 238 239Thread Suspension Mechanics 240--------------------------- 241 242Thread suspension is initiated by a registered thread, except that, for testing 243purposes, `SuspendAll` may be invoked with `self == nullptr`. We never suspend 244the initiating thread, explicitly exclusing it from `SuspendAll()`, and failing 245`SuspendThreadBy...()` requests to that effect. 246 247The suspend calls invoke `IncrementSuspendCount()` to increment the thread 248suspend count for each thread. That adds a "suspend barrier" (atomic counter) to 249the per-thread list of such counters to decrement. It normally sets the 250`kSuspendRequest` ("should enter safepoint handler") and `kActiveSuspendBarrier` 251("need to notify us when suspended") flags. 252 253After setting these two flags, we check whether the thread is suspended and 254`kSuspendRequest` is still set. Since the thread is already suspended, it cannot 255be expected to respond to "pass the suspend barrier" (decrement the atomic 256counter) in a timely fashion. Hence we do so on its behalf. This decrements 257the "barrier" and removes it from the thread's list of barriers to decrement, 258and clears `kActiveSuspendBarrier`. `kSuspendRequest` remains to ensure the 259thread doesn't prematurely return to runnable state. 260 261If `SuspendAllInternal()` does not immediately see a suspended state, then it is up 262to the target thread to decrement the suspend barrier. 263`TransitionFromRunnableToSuspended()` calls 264`TransitionToSuspendedAndRunCheckpoints()`, which changes the thread state 265and returns. `TransitionFromRunnableToSuspended()` then calls 266`CheckActiveSuspendBarriers()` to check for the `kActiveSuspendBarrier` flag 267and decrement the suspend barrier if set. 268 269The `suspend_count_lock_` is not consistently held in the target thread 270during this process. Thus correctness in resolving the race between a 271suspension-requesting thread and a target thread voluntarily suspending relies 272on first requesting suspension, and then checking whether the target is 273already suspended, The detailed correctness argument is given in a comment 274inside `SuspendAllInternal()`. This also ensures that the barrier cannot be 275decremented after the stack frame holding the barrier goes away. 276 277This relies on the fact that the two stores in the two threads to the state and 278kActiveSuspendBarrier flag are ordered with respect to the later loads. That's 279guaranteed, since they are all stored in a single `atomic<>`. Thus even relaxed 280accesses are OK. 281 282The actual suspend barrier representation still varies between `SuspendAll()` 283and `SuspendThreadBy...()`. The former relies on the fact that only one such 284barrier can be in use at a time, while the latter maintains a linked list of 285active suspend barriers for each target thread, relying on the fact that each 286one can appear on the list of only one thread, and we can thus use list nodes 287allocated in the stack frames of requesting threads. 288 289**Avoiding suspension cycles** 290 291Any thread can issue a `SuspendThreadByPeer()`, `SuspendThreadByThreadId()` or 292`SuspendAll()` request. But if Thread A increments Thread B's suspend count 293while Thread B increments Thread A's suspend count, and they then both suspend 294during a subsequent thread transition, we're deadlocked. 295 296For single-thread suspension requests, we refuse to initiate 297a suspend request from a registered thread that is also being asked to suspend 298(i.e. the suspend count is nonzero). Instead the requestor waits for that 299condition to change. This means that we cannot create a cycle in which each 300thread has asked to suspend the next one, and thus no thread can progress. The 301required atomicity of the requestor suspend count check with setting the suspend 302count of the target(s) target is ensured by holding `suspend_count_lock_`. 303 304For `SuspendAll()`, we enforce a requirement that at most one `SuspendAll()` 305request is running at one time. We also set the `kSuspensionImmune` thread flag 306to prevent a single thread suspension of a thread currently between 307`SuspendAll()` and `ResumeAll()` calls. Thus once a `SuspendAll()` call starts, 308it will complete before it can be affected by suspension requests from other 309threads. 310 311[^1]: In the most recent versions of ART, compiler-generated code loads through 312 the address at `tlsPtr_.suspend_trigger`. A thread suspension is requested 313 by setting this to null, triggering a `SIGSEGV`, causing that thread to 314 check for GC cooperation requests. The older mechanism instead sets an 315 appropriate `ThreadFlag` entry to request suspension or a checkpoint. Note 316 that the actual checkpoint function value is set, along with the flag, while 317 holding `suspend_count_lock_`. If the target thread notices that a 318 checkpoint is requested, it then acquires the `suspend_count_lock_` to read 319 the checkpoint function. 320