Measuring Function Execution Time with Perf
In this post, I will show you how to use perf to evaluate function or code block execution time. This requires a bit of effort as perf is essentially a sampling profiler, which collects execution statistics periodically and when specific events occur. As such, perf does not automatically execution time of specific blocks of code or functions.
Sample program
Let's first create a sample application with fairly predictable behavior so that we know what we are looking at in the perf data.
Our sample application has two functions: main
and wait
. Both functions have simple delay loops. The time spent in the delay loops is relative to ITERATION_COUNTER
. The wait
function spends the whole ITERATION_COUNTER
amount of time in the delay loop, while the main
spends half of that.
Both loops are run 4 times. Besides the loops, the main
function prints an execution log for each of the 4 execution times.
You can find the complete source below
We compile this by running
gcc -O0 -ggdb3 -fno-omit-frame-pointer simple.c -o simple
For the sake of simplicity, we disable the optimization flags. -ggdb3
ensures that we have all the debugging info we need and -fno-omit-frame-pointer
is there just as a reminder that it should be present when we apply some compiler optimization but still want to do profiling.
Assuming that majority of the time is spent in the delay loops, we should have roughly the following split for the execution times
function | execution time |
---|---|
main |
33% |
wait |
66% |
That is, basically, all the time is spent in the delay loops and wait
uses twice as much time as main
Basic profiling
It's time to run some very basic profiling to ensure that our program is behaving as we expect
perf record -F1000 -g ./simple
We sample the program at 1000Hz and include the callgraph (-g
). Next, we check how much time, relatively, was spent on each function
perf report -g ./simple
The result should be something akin to the following
Checking the "self" time reveals that we are indeed spending the time between main
and wait
as expected.
Computing execution times of the functions
Now, the point of the story was to calculate the execution times of the functions. So far we have verified that our application does what it's supposed to, we can start looking at some absolute numbers.
As mentioned before, we cannot get direct execution times of individual functions. What we can get are statistics from supported events and periodic samples.
We will exploit the event tracing by adding user-level probes to the entry and exit points of the desired functions, and then with some post-processing magic, calculate the execution times of those functions.
Before we can add the uprobes, we need to figure out the corresponding symbol names
perf probe --list -x ./simple
This should give you something like this
We are interested in the wait
and main
symbols.
Let's add the probes to the entry and exit points of these functions
sudo perf probe -x ./simple --add main --add main%return --add wait --add wait%return
Notice how just using the symbol name adds the probe to the entry point of the functions. Adding a %return
postfix puts the probe into the exit point of the function. Note that you probably need to run probe addition as a privileged user (sudo).
Now, we have the event probes, in place, and we are ready to profile these events
sudo perf record -g -e probe_simple:main -e probe_simple:main__return -e probe_simple:wait -e probe_simple:wait__return ./simple
This records only the events that we are interested in. We would be expecting something matching the call times of the functions
function | time called |
---|---|
main |
1 |
wait |
4 |
Let's make a quick sanity check
sudo perf report -g ./simple
You should be looking at something like the following
Looks good! Let's proceed to the post-processing phase, where we calculate the actual execution times.
This is the hard part. We need to create a script that goes through all the samples and calculates the relevant statistics. Luckily, perf helps you to get started. By running perf script -g python
, perf will create a script template with all events readily in-place.
We will use the pandas library to do some of the heavy liftings for us. Replace the contents of perf-script.py with
This script should be reasonably self-explanatory. When we run this script via perf script
the def probe_xxx
functions are called for each corresponding event sample. We store the start time of the function in the entry event and calculate the duration in the return event. In the end, trace_end
get called, where we combine all event durations into a single pandas data frame and print out the corresponding statistics.
The post-processing script is executed by running
perf script -s perf-script.py
The results should show the execution duration statistics for both functions
These statistics are in milliseconds. Let's do a final round of sanity-checking. We know that most of the time is spent in the delay loops and that the delay loop within main
is half the duration of wait
. How does this add up?
Time spent in main
\[
\text{main} = 4 \times \text{wait} + 0.5 \times 4 \times \text{wait} = 6 \times \text{wait} = 6 \times 227.89 = 1367.34
\]
This adds up quite nicely close to the execution time of main
.
Summary
While basic profiling with perf is a fairly straightforward process, getting the execution times of specific functions or code blocks and be somewhat involved. When you've done it a few times, you will get a hang of the process. In the end, you should have scripts that make it easy to repeat the measurements, and also you can easily extend the analysis to more advanced statistics.