What, exactly, would it mean to have code generated for g()? If you were writing it yourself, what code would you write? Seriously, this is a real question. You have to decide what output you're expecting before you can start cajoling it from the compiler.
Anyway, let's look at what you have now. In f(),
void f() {
char x = 1;
doNotOptimize(&x); // x is NOT optimized away
}
you are taking the address of x
, which prevents the optimizer from allocating it in a register. It has to be allocated in memory in order for it to have an address.
However, in g(),
void g() {
char x = 1;
doNotOptimize(x); // x is optimized away
}
x
is just a local variable and any sane optimizer will allocate that in a register, or in this case as a constant. This is allowed, since you never take its address; you just use its value. So, for example, the compiler might generate code like this:
g():
mov al, 1 // store 1 in BYTE-sized register AL
...
Or as in this case not generate any code at all, and substitute any use of the variable by it's constant value.
Your doNotOptimize
code,
template <typename T>
void doNotOptimize(T const& val) {
asm volatile("" : : "g"(val) : "memory");
}
uses the g
constraint for the val
parameter, which says that it can be stored in either a general-purpose register, memory or as a constant, whichever the optimizer finds most convenient. Since val
is a constant, when this call is inlined, the optimizer leaves it as a constant. Your "memory" clobber specifier has no effect, because there is no modification of memory going on here.
So what can we do? Well, we can force the variable x
to be allocated in memory, even though it doesn't need to be, by using the m
constraint:
template <typename T>
void doNotOptimize(T const& val) {
asm volatile("" : : "m"(val) : "memory");
}
void g() {
char x = 1;
doNotOptimize(x);
}
Now the compiler can't optimize the store of x
away and is forced to emit the following code:
g():
mov BYTE PTR [rsp-1], 1
ret
Note that this is basically the same effect that declaring the x
variable volatile
would have.
Remember the question I asked at the beginning? Is that the output you wanted?
Or, maybe you want the compiler to emit that immediate-to-register move. If so, the r
constraint will work—or any of the x86-specific constraints that allow you to dictate a particular register. This forces the optimizer to allocate the value in a register, even though it doesn't need to be:
g():
mov eax, 1
ret
I cannot, however, see what the point of either of these would be.
If you wanted to craft a microbenchmark that tested the overhead of calling a function with a single const-reference parameter, then a better option would be to ensure that the definition of the function being called is not visible to the optimizer. Then, it can't inline that function and has to arrange for the call to be made, including all necessary setup. This also works well if you're just studying how a compiler might emit that code. (Naturally, you can't use a template function, though. Well, unless you wanted to abuse C++11's extern
templates.)