Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I want to send a list of values into a fragment shader. It is a possibly large (couple of thousand items long) list of single precision floats. The fragment shader needs random access to this list and I want to refresh the values from the CPU on each frame.

I'm considering my options on how this could be done:

  1. As a uniform variable of array type ("uniform float x[10];"). But there seems to be limits here, on my GPU sending more than a few hundred values is very slow and also I'd have to hard-code the upper limit in the shader when I'd rather would like to change that in runtime.

  2. As a texture with height 1 and width of my list, then refresh the data using glCopyTexSubImage2D.

  3. Other methods? I haven't kept up with all the changes in the GL-specification lately, perhaps there is some other method that is specifically designed for this purpose?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.2k views
Welcome To Ask or Share your Answers For Others

1 Answer

There are currently 4 ways to do this: standard 1D textures, buffer textures, uniform buffers, and shader storage buffers.

1D Textures

With this method, you use glTex(Sub)Image1D to fill a 1D texture with your data. Since your data is just an array of floats, your image format should be GL_R32F. You then access it in the shader with a simple texelFetch call. texelFetch takes texel coordinates (hence the name), and it shuts off all filtering. So you get exactly one texel.

Note: texelFetch is 3.0+. If you want to use prior GL versions, you will need to pass the size to the shader and normalize the texture coordinate manually.

The main advantages here are compatibility and compactness. This will work on GL 2.1 hardware (using the notation). And you don't have to use GL_R32F formats; you could use GL_R16F half-floats. Or GL_R8 if your data is reasonable for a normalized byte. Size can mean a lot for overall performance.

The main disadvantage is the size limitation. You are limited to having a 1D texture of the max texture size. On GL 3.x-class hardware, this will be around 8,192, but is guaranteed to be no less than 4,096.

Uniform Buffer Objects

The way this works is that you declare a uniform block in your shader:

layout(std140) uniform MyBlock
{
  float myDataArray[size];
};

You then access that data in the shader just like an array.

Back in C/C++/etc code, you create a buffer object and fill it with floating-point data. Then, you can associate that buffer object with the MyBlock uniform block. More details can be found here.

The principle advantages of this technique are speed and semantics. Speed is due to how implementations treat uniform buffers compared to textures. Texture fetches are global memory accesses. Uniform buffer accesses are generally not; the uniform buffer data is usually loaded into the shader when the shader is initialized upon its use in rendering. From there, it is a local access, which is much faster.

Semantically, this is better because it isn't just a flat array. For your specific needs, if all you need is a float[], that doesn't matter. But if you have a more complex data structure, the semantics can be important. For example, consider an array of lights. Lights have a position and a color. If you use a texture, your code to get the position and color for a particular light looks like this:

vec4 position = texelFetch(myDataArray, 2*index);
vec4 color = texelFetch(myDataArray, 2*index + 1);

With uniform buffers, it looks just like any other uniform access. You have named members that can be called position and color. So all the semantic information is there; it's easier to understand what's going on.

There are size limitations for this as well. OpenGL requires that implementations provide at least 16,384 bytes for the maximum size of uniform blocks. Which means, for float arrays, you get only 4,096 elements. Note again that this is the minimum required from implementations; some hardware can offer much larger buffers. AMD provides 65,536 on their DX10-class hardware, for example.

Buffer Textures

These are kind of a "super 1D texture". They effectively allow you to access a buffer object from a texture unit. Though they are one-dimensional, they are not 1D textures.

You can only use them from GL 3.0 or above. And you can only access them via the texelFetch function.

The main advantage here is size. Buffer textures can generally be pretty gigantic. While the spec is generally conservative, mandating at least 65,536 bytes for buffer textures, most GL implementations allow them to range in the megabytes in size. Indeed, usually the maximum size is limited by the GPU memory available, not hardware limits.

Also, buffer textures are stored in buffer objects, not the more opaque texture objects like 1D textures. This means you can use some buffer object streaming techniques to update them.

The main disadvantage here is performance, just like with 1D textures. Buffer textures probably won't be any slower than 1D textures, but they won't be as fast as UBOs either. If you're just pulling one float from them, it shouldn't be a concern. But if you're pulling lots of data from them, consider using a UBO instead.

Shader Storage Buffer Objects

OpenGL 4.3 provides another way to handle this: shader storage buffers. They're a lot like uniform buffers; you specify them using syntax almost identical to that of uniform blocks. The principle difference is that you can write to them. Obviously that's not useful for your needs, but there are other differences.

Shader storage buffers are, conceptually speaking, an alternate form of buffer texture. Thus, the size limits for shader storage buffers are a lot larger than for uniform buffers. The OpenGL minimum for the max UBO size is 16KB. The OpenGL minimum for the max SSBO size is 16MB. So if you have the hardware, they're an interesting alternative to UBOs.

Just be sure to declare them as readonly, since you're not writing to them.

The potential disadvantage here is performance again, relative to UBOs. SSBOs work like an image load/store operation through buffer textures. Basically, it's (very nice) syntactic sugar around an imageBuffer image type. As such, reads from these will likely perform at the speed of reads from a readonly imageBuffer.

Whether reading via image load/store through buffer images is faster or slower than buffer textures is unclear at this point.

Another potential issue is that you must abide by the rules for non-synchronous memory access. These are complex and can very easily trip you up.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...