I think your best bet is to not implement one until profiles prove that the CRT is fragmenting memory in a way that damages the performance of your application. CRT, core OS, and STL guys spend a lot of time thinking about memory management.
There's a good chance that your code will perform quite fine under existing allocators with no changes needed. There's certainly a better chance of that, than there is of you getting a memory allocator right the first time. I've written memory allocators before for similar circumstances and it's a monsterous task to take on. Not so suprisingly, the version I inherited was rife with fragmentation problems.
The other advantage of waiting until a profile shows it's a problem is that you will also know if you've actually fixed anything. That's the most important part of a performance fix.
As long as you're using standard collection classes an algorihtmns (such as STL/BOOST) it shouldn't be very hard to plug in a new allocator later on in the cycle to fix the portions of your code base that do need to be fixed. It's very unlikely that you will need a hand coded allocator for your entire program.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…