How to
There are several ways. I'll describe the simplest one.
Just create separate vertex buffers:
ID3D11Buffer* positions;
ID3D11Buffer* texcoords;
ID3D11Buffer* normals;
Create input layout elements, incrementing InputSlot
member for each component:
{ "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },
{ "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 1, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
{ "NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 2, D3D11_APPEND_ALIGNED_ELEMENT, D3D11_INPUT_PER_VERTEX_DATA, 0 },
// ^
// InputSlot
Bind buffers to their slots (better all in one shot):
ID3D11Buffer** vbs = {positions, texcoords, normals};
unsigned int strides[] = { /*strides go here*/ };
unsigned int offsets [] = { /*offsets go here*/ };
m_Context->IASetVertexBuffers(0, 3, vbs, strides, offsets );
Draw as usual.
You don't need to change HLSL code (HLSL will think as it have single buffer).
Note, that code snippets was written on-the-fly and can contain mistakes.
Edit: you can improve this approach, combining buffers by update rate: if texcoords
and normals
never changed, merge them.
As of performance
It is all about locality of references: the closer data, the faster access.
Interleaved buffer, in most cases, gives (by far) more performance for GPU side (i.e. rendering): for each vertex each attribute near each other. But separate buffers gives faster CPU access: arrays are contiguous, each next data is near previous.
So, overall, performance concerns depends on how often you writing to buffers. If your limiting factor is CPU writes, stick to separate buffers. If not, go for single one.
How will you know? Only one way - profile. Both, CPU side, and GPU side (via Graphics debugger/profiler from your GPU's vendor).
Another factors
The best practice is to limit CPU writes, so, if you will find that you are limited by buffer updating, you probably need to re-view your approach. Do we need to update buffer each frame if we have 500 fps? User won't see difference if you reduce buffer update rate to 30-60 times per second (unbind buffer update from frame update). So, if your updating strategy is reasonable, you will likely never be CPU-limited and best approach is classic interleaving.
You can also consider re-designing your data pipeline, or even somehow prepare data offline (we call it "baking"), so you will not need to cope with non-interleaved buffers. That will be quite reasonable too.
Reduce memory footprint or increase performance?
Memory-to-performance tradeoff. This is the eternal question. Duplicate memory to take advantages of interleaving? Or not?
Answer is... "that depends". You are programming new CryEngine, targeting top GPUs with gigabytes of memory? Or you're programming for embedded systems of mobile platform, where memory resources slow and limited? Does 1 megabyte memory worth hassle at all? Or you have huge models, 100 MB each? We don't know.
It's all up to you to decide. But remember: there are no free candies. If you'll find memory economy worth performance loss, do it. Profile and compare to be sure.
Hope it helps somehow. Happy coding! =)