I have a loop that has been parallelized by OpenMP, but due to the nature of the task, there are 4 critical
clauses.
What would be the best way to profile the speed up and find out which of the critical clauses (or maybe non-critical(!) ) take up the most time inside the loop?
I use Ubuntu 10.04 with g++ 4.4.3
See Question&Answers more detail:os