No mention of overhead. strace can bring down your production environment -- it ...

zwischenzug · on Feb 12, 2018

Whenever I've had to use it in prod (in heavy OLTP environments) the seriousness of the issue has always outweighed any performance concerns. Ditto tcpdump. Often it was used specifically to determine the cause of performance issues. In any case you generally only strace 1 process, and if your application stack depends on one process you're probably in other kinds of trouble... unless it's erlang :)

brendangregg · on Feb 12, 2018

It's not a choice between strace or nothing. It's a choice between strace, ftrace, perf, or eBPF -- and that's just the Linux builtins. Many low overhead addons can also do syscall tracing (sysdig, LTTng).

I often run ftrace, perf, and eBPF on our production instances for syscall tracing. If I ran strace, the instance would suddenly be very slow, and it would trigger Hysterix (and other) timeouts and be removed from the ASG and auto terminated. Our environment is fault-tolerant, so yes, we can run strace -- you just don't get much output, and the load vanishes from the instance you are looking at.

Bromskloss · on Feb 12, 2018

Which of those alternatives are an option if the goal is to inspect and rewrite syscall arguments and return values, and do other things in between?

helper · on Feb 12, 2018

Last time I tried `perf trace` I realized how many things strace does that I take for granted. Things like file handle to filename resolution and pretty printing read() and write() buffers.

Do newer versions of `perf trace` expose these?

Bromskloss · on Feb 12, 2018

> it is not safe

Are you referring to that it makes syscalls slow, or is it about something else?

cynwoody · on Feb 12, 2018

Many thanks!

Yours is a very superior treatment of the strace topic.

shaklee3 · on Feb 12, 2018

Thanks. Really enjoy your blog.