This question came up while I was reading (the answers to) So why is i = ++i + 1 well-defined in C++11?
I gather that the subtle explanation is that (1) the expression ++i
returns an lvalue but +
takes prvalues as operands, so a conversion from lvalue to prvalue must be performed; this involves obtaining the current value of that lvalue (rather than one more than the old value of i
) and must therefore be sequenced after the side effect from the increment (i.e., updating i
) (2) the LHS of the assignment is also an lvalue, so its value evaluation does not involve fetching the current value of i
; while this value computation is unsequenced w.r.t. the value computation of the RHS, this poses no problem (3) the value computation of the assignment itself involves updating i
(again), but is sequenced after the value computation of its RHS, and hence after the prvious update to i
; no problem.
Fine, so there is no UB there. Now my question is what if one changed the assigment operator from =
to +=
(or a similar operator).
Does the evaluation of the expression
i += ++i + 1
lead to undefined behavior?
As I see it, the standard seems to contradict itself here. Since the LHS of +=
is still an lvalue (and its RHS still a prvalue), the same reasoning as above applies as far as (1) and (2) are concerned; there is no undefined behavior in the evalutation of the operands on +=
. As for (3), the operation of the compound assignment +=
(more precisely the side effect of that operation; its value computation, if needed, is in any case sequenced after its side effect) now must both fetch the current value of i
, and then (obviously sequenced after it, even if the standard does not say so explicitly, or otherwise the evaluation of such operators would always invoke undefined behavior) add the RHS and store the result back into i
. Both these operations would have given undefined behavior if they were unsequenced w.r.t. the side effect of the ++
, but as argued above (the side effect of the ++
is sequenced before the value computation of +
giving the RHS of the +=
operator, which value computation is sequenced before the operation of that compound assignment), that is not the case.
But on the other hand the standard also says that E += F
is equivalent to E = E + F
, except that (the lvalue) E is evaluated only once. Now in our example the value computation of i
(which is what E
is here) as lvalue does not involve anything that needs to be sequenced w.r.t. other actions, so doing it once or twice makes no difference; our expression should be strictly equivalent to E = E + F
. But here's the problem; it is pretty obvious that evaluating i = i + (++i + 1)
would give undefined behaviour! What gives? Or is this a defect of the standard?
Added. I have slightly modified my discussion above, to do more justice to the proper distinction between side effects and value computations, and using "evaluation" (as does the standard) of an expression to encompass both. I think my main interrogation is not just about whether behavior is defined or not in this example, but how one must read the standard in order to decide this. Notably, should one take the equivalence of E op= F
to E = E op F
as the ultimate authority for the semantics of the compound assignment operation (in which case the example clearly has UB), or merely as an indication of what mathematical operation is involved in determining the value to be assigned (namely the one identified by op
, with the lvalue-to-rvalue converted LHS of the compound assignment operator as left operand and its RHS as right operand). The latter option makes it much harder to argue for UB in this example, as I have tried to explain. I admit that it is tempting to make the equivalence authoritative (so that compound assignments become a kind of second-class primitives, whose meaning is given by rewriting in term of first-class primitives; thus the language definition would be simplified), but there are rather strong arguments against this:
The equivalence is not absolute, because of the "
E
is evaluated only once" exception. Note that this exception is essential to avoid making any use where the evaluation ofE
involves a side effect undefined behavior, for instance in the fairly commona[i++] += b;
usage. If fact I think no absolutely equivalent rewriting to eliminate compound assignments is possible; using a fictive|||
operator to designate unsequenced evaluations, one might try to defineE op= F;
(withint
operands for simplicity) as equivalent to{ int& L=E ||| int R=F; L = L + R; }
, but then the example no longer has UB. In any case the standard gives us no rewriitng recipe.The standard does not treat compound assignments as second-class primitives for which no separate definition of semantics is necessary. For instance in 5.17 (emphasis mine)
The assignment operator (=) and the compound assignment operators all group right-to-left. [...] In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. With respect to an indeterminately-sequenced function call, the operation of a compound assignment is a single evaluation.
- If the intention were to let compound assignments be mere shorthands for simple assignments, there would be no reason to include them explicitly in this description. The final phrase even directly contradicts what would be the case if the equivalence was taken to be authoritative.
If one admits that compound assignments have a semantics of their own, then the point arises that their evaluation involves (apart from the mathematical operation) more than just a side effect (the assignment) and a value evaluation (sequenced after the assignment), but also an unnamed operation of fetching the (previous) value of the LHS. This would normally be dealt with under the heading of "lvalue-to-rvalue conversion", but doing so here is hard to justify, since there is no operator present that takes the LHS as an rvalue operand (though there is one in the expanded "equivalent" form). It is precisely this unnamed operation whose potential unsequenced relation with the side effect of ++
would cause UB, but this unsequenced relation is nowhere explicitly stated in the standard, because the unnamed operation is not. It is hard to justify UB using an operation whose very existence is only implicit in the standard.