Here is a post on how to locate certain straggler task in openmp parallel programs using perf and gdb.
Often times, I find the perf report of my parallel program looks like the following
54.47% test.o libiomp5.so [.] _Z27__kmp_hyper_barrier_release12barrier_typeP8kmp_
this is a sign that the threads are spinning with no work for a large portion of the program.
In this scenario, we need to find the straggler task, which is probably not parallelized. And once we locate the straggler task, we can try to parallelize it or do something to reduce its time.
To do this, one trick is to do
gdb –args ./executable
Once it started running, just ctrl-c to stop a couple of times, and then use
to see what all the threads are doing. in my case, often most of the threads would be at
__kmp_wait_template or some other barrier function, that indicates the threads are waiting in the barrier. This time, just locate the thread that is not in a barrier, the tasks it is executing is probably the straggler.
credits to Vlad, who showed me this trick. Since then I have used it quite a few times to do performance debugging on my openmp programs.