This is a well-worn subject, and SO contains oodles of good and bad advice on it.
Let me just tell you what I have found from much experience doing performance tuning.
There are tradeoff curves between performance and other things like memory and clarity, right? And you would expect that to get better performance you would have to give something up, right?
That is only true if the program is on the tradeoff curve. Most software, as first written, is miles away from the tradeoff curve. Most of the time, it is irrelevant and ignorant to talk about giving up one thing to get another.
The method I use is not measurement, it is diagnosis. I don't care how fast various routines are or how often they are called. I want to know precisely which instructions are causing slowness, and why.
The dominant and primary cause of poor performance in good-sized software efforts (not little one-person projects) is galloping generality. Too many layers of abstraction are employed, each of which extracts a performance penalty. This is not usually a problem - until it is a problem - and then it's a killer.
So what I do is tackle one issue at a time. I call these "slugs", short for "slowness bugs". Each slug that I remove yields a speedup of anywhere from 1.1x to 10x, depending on how bad it is. Every slug that is removed makes the remaining ones take a larger fraction of the remaining time, so they become easier to find. In this way, all the "low-hanging fruit" can be quickly disposed of.
At that point, I know what is costing the time, but the fixes may be more difficult, such as partial redesign of the software, possibly by removing extraneous data structure or using code generation. If it is possible to do this, that can set off a new round of slug-removal until the program is many times not only faster than it was to begin with, but smaller and clearer as well.
I recommend getting experience like this for yourself, because then when you design software you will know what not to do, and you will make better (and simpler) designs to begin with. At the same time, you will find yourself at odds with less experienced colleagues who can't begin thinking about a design without conjuring up a dozen classes.
ADDED: Now, to try to answer your question, low-level optimization should be done when diagnosis says you have a hot-spot (i.e. some code at the bottom of the call stack appears on enough call stack samples (10% or more) to be known to be costing significant time). AND if the hot-spot is in code you can edit. If you've got a hot-spot in "new", "delete", or string-compare, look higher up the stack for things to get rid of.
Hope that helps.