This is a very good introduction to Perf, which I found online, along with some of my comments on the post. I generally recommend this post if this is the first time you are learning to use PERF. 🙂
I like the general introduction on how to use perf very much.
I think Perf Report just worked for me (showing the original code and disassemblies) without using “perf annotate” command. I did have to compile the program with “-g” flag to make sure that I have all the symbols.
I also like the discussion on what is “sampling” and the frequency of sampling’s impact on performance (overhead)
Finally, why don’t
cpu-clock profiles identify slow instructions? The answer has to do with the way
cpu-clock events are attributed back to binary code when handling a timer interrupt. First, there is a delay between the time when a sampling interrupt is requested and when the sampling interrupt is honored and handled. Second, the program counter value that is captured in a sample is the program address at which execution will restart after interrupt handling is complete; It is not the program location where the interrupt is first asserted. The combination of these two factors is called skid and it affects the attribution and distribution of samples in the final profile. In the presence of skid, samples are attributed to the general neighborhood around performance culprits such as long latency memory load operations. The
cpu-clock event cannot be precisely attributed to slow instructions — just the hot code region containing the culprit. This is why you shouldn’t conclude that the compare (cmp) instruction:”
This ties back to the ” -e cycles” issue I listed in a previous post.