Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

It happened to me a few times to parallelize portion of programs with OpenMP just to notice that in the end, despite the good scalability, most of the foreseen speed-up was lost due to the poor performance of the single threaded case (if compared to the serial version).

The usual explanation that appears on the web for this behavior is that the code generated by compilers may be worse in the multi-threaded case. Anyhow I am not able to find anywhere a reference that explains why the assembly may be worse.

So, what I would like to ask to the compiler guys out there is:

May compiler optimizations be inhibited by multi-threading? In case, how could performance be affected?

If it could help narrowing down the question I am mainly interested in high-performance computing.

Disclaimer: As stated in the comments, part of the answers below may become obsolete in the future as they briefly discuss the way in which optimizations are handled by compilers at the time the question was posed.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
293 views
Welcome To Ask or Share your Answers For Others

1 Answer

Short from the explicit pragmas for OMP, compilers just don't know that code can be executed by multiple threads. So they can neither make that code more nor less efficient.

This has grave consequences in C++. It is particularly a problem to library authors, they cannot reasonably guess up front whether their code will be used in a program that uses threading. Very visible when you read the source of a common C-runtime and standard C++ library implementation. Such code tends to be peppered with little locks all over the place to ensure the code still operates correctly when it is used in threads. You pay for this, even if you don't actually use that code in a threaded way. A good example is std::shared_ptr<>. You pay for the atomic update of the reference count, even if the smart pointer is only ever used in one thread. And the standard doesn't provide a way to ask for non-atomic updates, a proposal to add the feature was rejected.

And it very much is detrimental the other way as well, there isn't anything the compiler can do to ensure your own code is thread-safe. It is entirely up to you to make it thread-safe. Hard to do and this goes wrong in subtle and very difficult to diagnose ways all the time.

Big problems, not simple to solve. Maybe that's a good thing, otherwise everybody could be a programmer ;)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...