Stopping Safely

Writing well-behaved callbacks
Using UTRACE_STOP

Writing well-behaved callbacks

Well-behaved callbacks are important to maintain two essential properties of the interface. The first of these is that unrelated tracing engines should not interfere with each other. If your engine's event callback does not return quickly, then another engine won't get the event notification in a timely manner. The second important property is that tracing should be as noninvasive as possible to the normal operation of the system overall and of the traced thread in particular. That is, attached tracing engines should not perturb a thread's behavior, except to the extent that changing its user-visible state is explicitly what you want to do. (Obviously some perturbation is unavoidable, primarily timing changes, ranging from small delays due to the overhead of tracing, to arbitrary pauses in user code execution when a user stops a thread with a debugger for examination.) Even when you explicitly want the perturbation of making the traced thread block, just blocking directly in your callback has more unwanted effects. For example, the CLONE event callbacks are called when the new child thread has been created but not yet started running; the child can never be scheduled until the CLONE tracing callbacks return. (This allows engines tracing the parent to attach to the child.) If a CLONE event callback blocks the parent thread, it also prevents the child thread from running (even to process a SIGKILL). If what you want is to make both the parent and child block, then use utrace_attach_task on the child and then use UTRACE_STOP on both threads. A more crucial problem with blocking in callbacks is that it can prevent SIGKILL from working. A thread that is blocking due to UTRACE_STOP will still wake up and die immediately when sent a SIGKILL, as all threads should. Relying on the utrace infrastructure rather than on private synchronization calls in event callbacks is an important way to help keep tracing robustly noninvasive.

Using UTRACE_STOP

To control another thread and access its state, it must be stopped with UTRACE_STOP. This means that it is stopped and won't start running again while we access it. When a thread is not already stopped, utrace_control returns -EINPROGRESS and an engine must wait for an event callback when the thread is ready to stop. The thread may be running on another CPU or may be blocked. When it is ready to be examined, it will make callbacks to engines that set the UTRACE_EVENT(QUIESCE) event bit. To wake up an interruptible wait, use UTRACE_INTERRUPT.

As long as some engine has used UTRACE_STOP and not called utrace_control to resume the thread, then the thread will remain stopped. SIGKILL will wake it up, but it will not run user code. When the stop is cleared with utrace_control or a callback return value, the thread starts running again. (See also the section called “Tear-down Races”.)