Some of us at yesterday's presentation by Bryan Cantrill on Solaris 10 were wondering how DTrace can instrument kernel code with no extra speed overhead, or as he claimed, with zero probe effect, which is a much stronger claim than just no time penalty.
The detailed answer is in their paper "Dynamic Instrumentation of Production Systems" available here: http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf
Here's the most relevant paragraph which explains how they do it for Function Boundary Tracing on SPARC (x86 is not as clean, surprise, surprise):
On SPARC, FBT [Function Boundary Tracing] works by replacing an instruction with an unconditional annulled branch-always (ba,a) instruction. The branch redirects control flow into an FBT-controlled trampoline, which prepares arguments and transfers control into DTrace. Upon return from DTrace, the replaced instruction is executed in the trampoline before transferring control back to the instrumented code path. This is a similar mechanism to that used by Kerninst[13] -- but it is at once less general (it instruments only function entry and return) and completely safe (it will never erroneously instrument code executed at TL>0).
How they did it for Statically-defined Tracing (section 4.2 in the paper) is also interesting, and as they admit, not quite zero probe effect because of potential register pressure on the compiler.
The paper is a good read, especially after the presentation.
Stuart Williams.