using systemtap better

Brendan Gregg, kind enough to engage with us systemtap developers occasionally, posted a long comment about his experiences with our tool. Since his observations appear in good faith, I’m happy to respond to many items in kind. I’ll be brief to avoid tl;dr.

A common refrain is problems with older versions and/or older distributions, which have been fixed for some time. This pattern is not obvious because Brendan’s article lacks version numbers throughout (with handful of exceptions).

Installation. Brendan lists eight steps for installing systemtap on his machines. It is hard to verify all of them since his article does not list version numbers. Ubuntu kernel ddebs may be 610MB, but RHEL5/6/Fedora and clones are more like 200MB. Patching current systemtap releases is unnecessary, and ill-advised if coming from unofficial sources. Upgrading kernels solely for systemtap is unnecessary.

Manual Patching. Brendan’s experiences trying to hack around version-matching safety protections were misguided, considering he realized later that his downloaded debugging data didn’t match his kernel. The build-id error message should be improved to better explain what’s going on.

Execution. I don’t know what caused Brendan’s problem here. Again lacking version numbers or more complete crash information, one might speculate that it might have been one or another systemtap bug, both fixed three releases ago in 2010. Or it may be related to the version-matching safety protection hacking above.

Safe use on production systems. Systemtap has many satisfied users on production systems, so with proper care and testing, it is fine. There have also been problems, sometimes bugs in systemtap, and often in other parts of the system stack that we cannot control, which can make the overall experience less robust than we would like. My impression is that commercial support relationships make it more likely for the whole stack to be improved.

Most of the Time The “ERROR: kernel read fault” message is a soft error in the sense that it represents a caught problem at run time. Sometimes this is a debuginfo quality (gcc version sensitive) issue, sometimes a tapset fragility one. We’ll improve this aspect, for example via this bug .

Screenshot Actually, the “write” and “bytes” warnings are proper, and the rest of the warning message makes it obvious that this was a typo. The intended variable was “write_bytes”. The mishmash of scripts on the wiki are not representative of the regularly tested set that comes with systemtap. Current versions may be seen here.

Uh-oh… It looks like this disktop.stp sample script is in need of an extra few words of clarification. It is “reading/writing disk” from the point of view of userspace. Other scripts in the io group use different metrics.

iostat-scsi.stp The error message for missing guru mode needs to be improved. However, Brendan misread the script as to what fragment actually requires -g. It’s not the %( %) stuff he quoted, which is for conditional processing. It’s for this block only:

%{
#include <linux/blkdev.h>
%}

function get_nr_sectors:long(rq:long) %{ /* pure */
THIS->__retvalue = blk_rq_sectors((const struct request *)(long)THIS->rq);
%}

vfsrlat.stp / WARNING. That error could not have come from the most recent release of the tool.

There’s More… Much of what Brendan was seeing may be explained by old versions or distributor packaging errors.

function(”*”) crashes I agree, it is disappointing that this still is not stable. It’s only a small consolation, but as far as we know, this is not a systemtap but rather a kernel problem. One may reproduce it easily with perf probe. Another way to adduce this is to run a script with -DSTP_ALIBI, which compiles out any probe handler-related code, so as to produce only a skeleton kernel module. If that fails, chances are high that a weakness in the underlying linux kernel (e.g., kprobes) facility is responsible. This has proven to be a tricky area for upstream linux developers to make fully robust. We generally advise against using very broad wildcards like this one, for this reason.

no profile-997 In linux, the standard system timer interrupt’s frequency is not variable. Using the perf-counter related probes, one may approximate odder profiling rates.

non-intuitive In systemtap, one may formulate Brendan’s script in the exact same “two explicit probes” style. The @entry syntax provides an alternative for those who find it more natural / expressive.

Incomplete documentation We suffer from perhaps too many bits of documentation rather than too few. Sometimes new features don’t get added to every place where they’d be appropriate. In this case, @entry is documented in the stapprobes man page, alongside information about general context variables and function-return probes.

Brendan’s conclusion included:

I'm expecting a response from the SystemTap developers will be: "it's better on Red Hat Enterprise Linux".

To the extent this is true, it is because Red Hat takes this problem area more seriously than most other distributors, and goes to a great effort to make the experience as smooth as possible. Neither Red Hat nor the remainder of the systemtap community is in a position to change packaging policies and practices at other distributions. While this may seem foreign to vendors who are used to complete development ownership of a single product, in linux we ~~suffer and~~ benefit from greater diversity of a much greater community.