Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have this simple example I was testing against and I noticed that gcc optimizations (-O3) seems not be as good as clang ones when operator new is involved. I was wondering what might be the issue and if it possible to force gcc to produce more optimized code somehow?

template<typename T>
T* create() { return new T(); }

int main() {
    auto result = 0;
    for (auto i = 0; i < 1000000; ++i) {
        result += (create<int>() != nullptr);
    }

    return result;
}


#clang3.6++ -O3 -s --std=c++11 test.cpp
#size a.out
   text    data     bss     dec     hex filename
   1324     616       8    1948     79c a.out
#time ./a.out 
real 0m0.002s
user 0m0.001s
sys  0m0.000s

#gcc4.9 -O3 -s --std=c++11 test.cpp
#size a.out
   text    data     bss     dec     hex filename
   1484     624       8    2116     844 a.out
#time ./a.out
real 0m0.045s
user 0m0.035s
sys  0m0.009s

Example above is just a simple version of the code I have been testing in the beginning, but it still illustrates the difference between gcc/clang. I checked the assembly code as well and there is not a huge difference in size, but definitely in performance. On the other hand maybe clang is doing something which is not allowed?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
145 views
Welcome To Ask or Share your Answers For Others

1 Answer

If we plug this code into godbolt we can see that clang optimizes away the code to this:

main:                                   # @main
movl    $1000000, %eax          # imm = 0xF4240
ret

while gcc does not perform this optimization. So the question is this a valid optimization? Does this follow the as-if rule, which is noted in the draft C++ standard section 1.9 Program execution which says(emphasis mine):

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.5

where note 5 says:

This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.

Since new could throw an exception which would have observable behavior since it would alter the return value of the program.

R.MartinhoFernandes argues that it is implementation detail when to throw an exception and therefore clang could decide this scenario would not cause an exception and therefore eliding the new call would not violate the as-if rule. This seems like a reasonable argument to me.

but as T.C. points out:

A replacement global operator new could have been defined in a different translation unit

Casey provided an example that shows even when clang sees there is a replacement it still performs this optimization even though there are lost side effects. So this seems like overly aggressive optimization.

Note, memory leaks are not undefined behavior.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...