Ordinarily synchronization issues for tracing engines are kept fairly
straightforward by using UTRACE_STOP
. You ask a
thread to stop, and then once it makes the
report_quiesce
callback it cannot do anything else
that would result in another callback, until you let it with a
utrace_control
call. This simple arrangement
avoids complex and error-prone code in each one of a tracing engine's
event callbacks to keep them serialized with the engine's other
operations done on that thread from another thread of control.
However, giving tracing engines complete power to keep a traced thread
stuck in place runs afoul of a more important kind of simplicity that
the kernel overall guarantees: nothing can prevent or delay
SIGKILL
from making a thread die and release its
resources. To preserve this important property of
SIGKILL
, it as a special case can break
UTRACE_STOP
like nothing else normally can. This
includes both explicit SIGKILL
signals and the
implicit SIGKILL
sent to each other thread in the
same thread group by a thread doing an exec, or processing a fatal
signal, or making an exit_group
system call. A
tracing engine can prevent a thread from beginning the exit or exec or
dying by signal (other than SIGKILL
) if it is
attached to that thread, but once the operation begins, no tracing
engine can prevent or delay all other threads in the same thread group
dying.
The report_reap
callback is always the final event
in the life cycle of a traced thread. Tracing engines can use this as
the trigger to clean up their own data structures. The
report_death
callback is always the penultimate
event a tracing engine might see; it's seen unless the thread was
already in the midst of dying when the engine attached. Many tracing
engines will have no interest in when a parent reaps a dead process,
and nothing they want to do with a zombie thread once it dies; for
them, the report_death
callback is the natural
place to clean up data structures and detach. To facilitate writing
such engines robustly, given the asynchrony of
SIGKILL
, and without error-prone manual
implementation of synchronization schemes, the
utrace infrastructure provides some special
guarantees about the report_death
and
report_reap
callbacks. It still takes some care
to be sure your tracing engine is robust to tear-down races, but these
rules make it reasonably straightforward and concise to handle a lot of
corner cases correctly.
The first sort of guarantee concerns the core data structures
themselves. struct utrace_engine is
a reference-counted data structure. While you hold a reference, an
engine pointer will always stay valid so that you can safely pass it to
any utrace call. Each call to
utrace_attach_task
or
utrace_attach_pid
returns an engine pointer with a
reference belonging to the caller. You own that reference until you
drop it using utrace_engine_put
. There is an
implicit reference on the engine while it is attached. So if you drop
your only reference, and then use
utrace_attach_task
without
UTRACE_ATTACH_CREATE
to look up that same engine,
you will get the same pointer with a new reference to replace the one
you dropped, just like calling utrace_engine_get
.
When an engine has been detached, either explicitly with
UTRACE_DETACH
or implicitly after
report_reap
, then any references you hold are all
that keep the old engine pointer alive.
There is nothing a kernel module can do to keep a struct
task_struct alive outside of
rcu_read_lock
. When the task dies and is reaped
by its parent (or itself), that structure can be freed so that any
dangling pointers you have stored become invalid.
utrace will not prevent this, but it can
help you detect it safely. By definition, a task that has been reaped
has had all its engines detached. All
utrace calls can be safely called on a
detached engine if the caller holds a reference on that engine pointer,
even if the task pointer passed in the call is invalid. All calls
return -ESRCH
for a detached engine, which tells
you that the task pointer you passed could be invalid now. Since
utrace_control
and
utrace_set_events
do not block, you can call those
inside a rcu_read_lock
section and be sure after
they don't return -ESRCH
that the task pointer is
still valid until rcu_read_unlock
. The
infrastructure never holds task references of its own. Though neither
rcu_read_lock
nor any other lock is held while
making a callback, it's always guaranteed that the struct
task_struct and the struct
utrace_engine passed as arguments remain valid
until the callback function returns.
The common means for safely holding task pointers that is available to
kernel modules is to use struct pid, which
permits put_pid
from kernel modules. When using
that, the calls utrace_attach_pid
,
utrace_control_pid
,
utrace_set_events_pid
, and
utrace_barrier_pid
are available.
The second guarantee is the serialization of
DEATH
and REAP
event
callbacks for a given thread. The actual reaping by the parent
(release_task
call) can occur simultaneously
while the thread is still doing the final steps of dying, including
the report_death
callback. If a tracing engine
has requested both DEATH
and
REAP
event reports, it's guaranteed that the
report_reap
callback will not be made until
after the report_death
callback has returned.
If the report_death
callback itself detaches
from the thread, then the report_reap
callback
will never be made. Thus it is safe for a
report_death
callback to clean up data
structures and detach.
The final sort of guarantee is that a tracing engine will know for sure
whether or not the report_death
and/or
report_reap
callbacks will be made for a certain
thread. These tear-down races are disambiguated by the error return
values of utrace_set_events
and
utrace_control
. Normally
utrace_control
called with
UTRACE_DETACH
returns zero, and this means that no
more callbacks will be made. If the thread is in the midst of dying,
it returns -EALREADY
to indicate that the
report_death
callback may already be in progress;
when you get this error, you know that any cleanup your
report_death
callback does is about to happen or
has just happened--note that if the report_death
callback does not detach, the engine remains attached until the thread
gets reaped. If the thread is in the midst of being reaped,
utrace_control
returns -ESRCH
to indicate that the report_reap
callback may
already be in progress; this means the engine is implicitly detached
when the callback completes. This makes it possible for a tracing
engine that has decided asynchronously to detach from a thread to
safely clean up its data structures, knowing that no
report_death
or report_reap
callback will try to do the same. utrace_detach
returns -ESRCH
when the struct
utrace_engine has already been detached, but is
still a valid pointer because of its reference count. A tracing engine
can use this to safely synchronize its own independent multiple threads
of control with each other and with its event callbacks that detach.
In the same vein, utrace_set_events
normally
returns zero; if the target thread was stopped before the call, then
after a successful call, no event callbacks not requested in the new
flags will be made. It fails with -EALREADY
if
you try to clear UTRACE_EVENT(DEATH)
when the
report_death
callback may already have begun, if
you try to clear UTRACE_EVENT(REAP)
when the
report_reap
callback may already have begun, or if
you try to newly set UTRACE_EVENT(DEATH)
or
UTRACE_EVENT(QUIESCE)
when the target is already
dead or dying. Like utrace_control
, it returns
-ESRCH
when the thread has already been detached
(including forcible detach on reaping). This lets the tracing engine
know for sure which event callbacks it will or won't see after
utrace_set_events
has returned. By checking for
errors, it can know whether to clean up its data structures immediately
or to let its callbacks do the work.
When a thread is safely stopped, calling
utrace_control
with UTRACE_DETACH
or calling utrace_set_events
to disable some events
ensures synchronously that your engine won't get any more of the callbacks
that have been disabled (none at all when detaching). But these can also
be used while the thread is not stopped, when it might be simultaneously
making a callback to your engine. For this situation, these calls return
-EINPROGRESS
when it's possible a callback is in
progress. If you are not prepared to have your old callbacks still run,
then you can synchronize to be sure all the old callbacks are finished,
using utrace_barrier
. This is necessary if the
kernel module containing your callback code is going to be unloaded.
After using UTRACE_DETACH
once, further calls to
utrace_control
with the same engine pointer will
return -ESRCH
. In contrast, after getting
-EINPROGRESS
from
utrace_set_events
, you can call
utrace_set_events
again later and if it returns zero
then know the old callbacks have finished.
Unlike all other calls, utrace_barrier
(and
utrace_barrier_pid
) will accept any engine pointer you
hold a reference on, even if UTRACE_DETACH
has already
been used. After any utrace_control
or
utrace_set_events
call (these do not block), you can
call utrace_barrier
to block until callbacks have
finished. This returns -ESRCH
only if the engine is
completely detached (finished all callbacks). Otherwise it waits
until the thread is definitely not in the midst of a callback to this
engine and then returns zero, but can return
-ERESTARTSYS
if its wait is interrupted.