I've thrown together some examples. I used GCC 4.4.4 in all of this.
Simple case, without -std=c++0x
First, I put together a very simple example with two classes that accept an std::string
each.
#include <string>
#include <iostream>
struct A /* construct by reference */
{
std::string s_;
A (std::string const &s) : s_ (s)
{
std::cout << "A::<constructor>" << std::endl;
}
A (A const &a) : s_ (a.s_)
{
std::cout << "A::<copy constructor>" << std::endl;
}
~A ()
{
std::cout << "A::<destructor>" << std::endl;
}
};
struct B /* construct by value */
{
std::string s_;
B (std::string s) : s_ (s)
{
std::cout << "B::<constructor>" << std::endl;
}
B (B const &b) : s_ (b.s_)
{
std::cout << "B::<copy constructor>" << std::endl;
}
~B ()
{
std::cout << "B::<destructor>" << std::endl;
}
};
static A f () { return A ("string"); }
static A f2 () { A a ("string"); a.s_ = "abc"; return a; }
static B g () { return B ("string"); }
static B g2 () { B b ("string"); b.s_ = "abc"; return b; }
int main ()
{
A a (f ());
A a2 (f2 ());
B b (g ());
B b2 (g2 ());
return 0;
}
The output of that program on stdout
is as follows:
A::<constructor>
A::<constructor>
B::<constructor>
B::<constructor>
B::<destructor>
B::<destructor>
A::<destructor>
A::<destructor>
Conclusion
GCC was able to optimize each and every temporary A
or B
away.
This is consistent with the C++ FAQ. Basically, GCC may (and is willing to) generate code that constructs a, a2, b, b2
in place, even if a function is called that appearantly returns by value. Thereby GCC can avoid many of the temporaries whose existence one might have "inferred" by looking at the code.
The next thing we want to see is how often std::string
is actually copied in the above example. Let's replace std::string
with something we can observe better and see.
Realistic case, without -std=c++0x
#include <string>
#include <iostream>
struct S
{
std::string s_;
S (std::string const &s) : s_ (s)
{
std::cout << " S::<constructor>" << std::endl;
}
S (S const &s) : s_ (s.s_)
{
std::cout << " S::<copy constructor>" << std::endl;
}
~S ()
{
std::cout << " S::<destructor>" << std::endl;
}
};
struct A /* construct by reference */
{
S s_;
A (S const &s) : s_ (s) /* expecting one copy here */
{
std::cout << "A::<constructor>" << std::endl;
}
A (A const &a) : s_ (a.s_)
{
std::cout << "A::<copy constructor>" << std::endl;
}
~A ()
{
std::cout << "A::<destructor>" << std::endl;
}
};
struct B /* construct by value */
{
S s_;
B (S s) : s_ (s) /* expecting two copies here */
{
std::cout << "B::<constructor>" << std::endl;
}
B (B const &b) : s_ (b.s_)
{
std::cout << "B::<copy constructor>" << std::endl;
}
~B ()
{
std::cout << "B::<destructor>" << std::endl;
}
};
/* expecting a total of one copy of S here */
static A f () { S s ("string"); return A (s); }
/* expecting a total of one copy of S here */
static A f2 () { S s ("string"); s.s_ = "abc"; A a (s); a.s_.s_ = "a"; return a; }
/* expecting a total of two copies of S here */
static B g () { S s ("string"); return B (s); }
/* expecting a total of two copies of S here */
static B g2 () { S s ("string"); s.s_ = "abc"; B b (s); b.s_.s_ = "b"; return b; }
int main ()
{
A a (f ());
std::cout << "" << std::endl;
A a2 (f2 ());
std::cout << "" << std::endl;
B b (g ());
std::cout << "" << std::endl;
B b2 (g2 ());
std::cout << "" << std::endl;
return 0;
}
And the output, unfortunately, meets the expectation:
S::<constructor>
S::<copy constructor>
A::<constructor>
S::<destructor>
S::<constructor>
S::<copy constructor>
A::<constructor>
S::<destructor>
S::<constructor>
S::<copy constructor>
S::<copy constructor>
B::<constructor>
S::<destructor>
S::<destructor>
S::<constructor>
S::<copy constructor>
S::<copy constructor>
B::<constructor>
S::<destructor>
S::<destructor>
B::<destructor>
S::<destructor>
B::<destructor>
S::<destructor>
A::<destructor>
S::<destructor>
A::<destructor>
S::<destructor>
Conclusion
GCC was not able to optimize away the temporary S
created by B
's constructor. Using the default copy constructor of S
did not change that. Changing f, g
to be
static A f () { return A (S ("string")); } // still one copy
static B g () { return B (S ("string")); } // reduced to one copy!
did have the indicated effect. It appears that GCC is willing to construct the argument to B
's constructor in place but hesitant to construct B
's member in place.
Do note that still no temporary A
or B
are created. That means a, a2, b, b2
are still being constructed in place. Cool.
Let's now investigate how the new move semantics may influence the second example.
Realistic case, with -std=c++0x
Consider adding the following constructor to S
S (S &&s) : s_ ()
{
std::swap (s_, s.s_);
std::cout << " S::<move constructor>" << std::endl;
}
And changing B
's constructor to
B (S &&s) : s_ (std::move (s)) /* how many copies?? */
{
std::cout << "B::<constructor>" << std::endl;
}
We get this output
S::<constructor>
S::<copy constructor>
A::<constructor>
S::<destructor>
S::<constructor>
S::<copy constructor>
A::<constructor>
S::<destructor>
S::<constructor>
S::<move constructor>
B::<constructor>
S::<destructor>
S::<constructor>
S::<move constructor>
B::<constructor>
S::<destructor>
B::<destructor>
S::<destructor>
B::<destructor>
S::<destructor>
A::<destructor>
S::<destructor>
A::<destructor>
S::<destructor>
So, we were able to replace four copies with two moves by using pass by rvalue.
But we actually constructed a broken program.
Recall g, g2
static B g () { S s ("string"); return B (s); }
static B g2 () { S s ("string"); s.s_ = "abc"; B b (s); /* s is zombie now */ b.s_.s_ = "b"; return b; }
The marked location shows the problem. A move was done on an object that is not a temporary. That's because rvalue references behave like lvalue references except they may also bind to temporaries. So we must not forget to overload B
's constructor with one that takes a constant lvalue reference.
B (S const &s) : s_ (s)
{
std::cout << "B::<constructor2>" << std::endl;
}
You will then notice that both g, g2
cause "constructor2" to be called, since the symbol s
in either case is a better fit for a const reference than for an rvalue reference.
We can persuade the compiler to do a move in g
in either of two ways:
static B g () { return B (S ("string")); }
static B g () { S s ("string"); return B (std::move (s)); }
Conclusions
Do return-by-value. The code will be more readable than "fill a reference I give you" code and faster and maybe even more exception safe.
Consider changing f
to
static void f (A &result) { A tmp; /* ... */ result = tmp; } /* or */
static void f (A &result) { /* ... */ result = A (S ("string")); }
That will meet the strong guarantee only if A
's assignment provides it. The copy into result
cannot be skipped, neither can tmp
be constructed in place of result
, since result
is not being constructed. Thus, it is slower than before, where no copying was necessary. C++0x compilers and move assignment operators would reduce the overhead, but it's still slower than to return-by-value.
Return-by-value provides the strong guarantee more easily. The object is constructed in place. If one part of that fails and other parts have already been constructed, normal unwinding will clean up and, as long as S
's constructor fulfills the basic guarantee with regard to its own members and the strong guarantee with regard to global items, the whole return-by-value process actually provides the strong guarantee.
Always pass by value if you're going to copy (onto the stack) anyway
As discussed in Want speed? Pass by value.. The compiler may generate code that constructs, if possible, the caller's argument in place, eliminating the copy, which it cannot do when you take by reference and then copy manually. Principal example:
Do NOT write this (taken from cited article)
T& T::operator=(T const& x) // x is a reference to the source
{
T tmp(x); // copy construction of tmp does the hard work
swap(*this, tmp); // trade our resources for tmp's
return *this; // our (old) resources get destroyed with tmp
}
but always prefer this
T& T::operator=(T x) // x is a copy of the source; hard work already done
{
swap(*this, x); // trade our resources for x's
return *this; // our (old) resources get destroyed with x
}
If you want to copy to a non-stack frame location pass by const reference pre C++0x and additionally pass by rvalue reference post C++0x
We already saw this. Pass by reference causes less copies to take place when in place construction is impossible than pass by value. And C++0x's move semantics may replace many copies with fewer and cheaper moves. But do keep in mind that moving will make a zombie out of the object that has been moved from. Moving is not copying. Just providing a constructor